summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* treewide: s/comparitor/comparator/Ilia Mirkin2016-12-123-3/+3
| | | | | | | | | | git grep -l comparitor | xargs sed -i 's/comparitor/comparator/g' Just happened to notice this in a patch that was sent and included one of the tokens in question. Signed-off-by: Ilia Mirkin <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* swr: [rasterizer core/memory] StoreTile: AVX512 progressTim Rowley2016-12-122-222/+138
| | | | | | Fixes to 128-bit formats. Reviwed-by: Bruce Cherniak <[email protected]>
* radeonsi: shrink the GSVS ring to account for the reduced item sizesNicolai Hähnle2016-12-121-1/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: shrink each vertex stream to the actually required sizeNicolai Hähnle2016-12-122-25/+40
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: use a single descriptor for the GSVS ringNicolai Hähnle2016-12-124-50/+67
| | | | | | | | | We can hardcode all of the fields for swizzling in the geometry shader. The advantage is that we use fewer descriptor slots and we no longer have to update any of the (ring) descriptors when the geometry shader changes. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: pack GS output components for each vertex stream contiguouslyNicolai Hähnle2016-12-121-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note that the memory layout of one vertex stream inside one "item" (= memory written by one GS wave) on the GSVS ring is: t0v0c0 ... t15v0c0 t0v1c0 ... t15v1c0 ... t0vLc0 ... t15vLc0 t0v0c1 ... t15v0c1 t0v1c1 ... t15v1c1 ... t0vLc1 ... t15vLc1 ... t0v0cL ... t15v0cL t0v1cL ... t15v1cL ... t0vLcL ... t15vLcL t16v0c0 ... t31v0c0 t16v1c0 ... t31v1c0 ... t16vLc0 ... t31vLc0 t16v0c1 ... t31v0c1 t16v1c1 ... t31v1c1 ... t16vLc1 ... t31vLc1 ... t16v0cL ... t31v0cL t16v1cL ... t31v1cL ... t16vLcL ... t31vLcL ... t48v0c0 ... t63v0c0 t48v1c0 ... t63v1c0 ... t48vLc0 ... t63vLc0 t48v0c1 ... t63v0c1 t48v1c1 ... t63v1c1 ... t48vLc1 ... t63vLc1 ... t48v0cL ... t63v0cL t48v1cL ... t63v1cL ... t48vLcL ... t63vLcL where tNN indicates the thread number, vNN the vertex number (in the order of EMIT_VERTEX), and cNN the output component (vL and cL are the last vertex and component, respectively). The vertex streams are laid out sequentially. The swizzling by 16 threads is hard-coded in the way the VGT generates the offset passed into the GS copy shader, and the jump every 16 threads is calculated from VGT_GSVS_RING_OFFSET_n and VGT_GSVS_RING_ITEMSIZE in a way that makes it difficult to deviate from this layout (at least that's what I've experimentally confirmed on VI after first trying to go the simpler route of just interleaving the vertex streams). Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: do not write non-existent components through the GSVS ringNicolai Hähnle2016-12-121-2/+4
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: only write values belonging to the stream when emitting GS vertexNicolai Hähnle2016-12-121-0/+3
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: generate an explicit switch instruction over vertex streamsNicolai Hähnle2016-12-121-8/+13
| | | | | | | | | | | | SimplifyCFG generates a switch instruction anyway when all four streams are present, but is simultaneously not smart enough to eliminate some redundant jumps that it generates. The generated assembly is still a bit silly, probably because the control flow annotation doesn't know how to handle a switch with uniform condition. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: fetch only outputs of current vertex stream from the GSVS ringNicolai Hähnle2016-12-121-16/+25
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: only export from GS copy shader for vertex stream 0Nicolai Hähnle2016-12-121-12/+19
| | | | | | | | When running the copy shader for vertex streams != 0, the SX does not need any data from us (there is no rasterization for the higher vertex streams, only streamout). Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: do not export VS outputs from vertex streams != 0Nicolai Hähnle2016-12-121-0/+6
| | | | | | | | | | | | This affects for GS copy shaders. When an output is meant for vertex stream != 0, then we don't have to make it available to the pixel shader. There is a minor inefficiency here because the GLSL varying packing pass does not group varyings of the same vertex stream together, but it shouldn't be important in practice. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: pull iteration over vertex streams into GS copy shader logicNicolai Hähnle2016-12-121-25/+37
| | | | | | The iteration is not needed for normal vertex shaders. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: group streamout writes by vertex streamNicolai Hähnle2016-12-121-10/+22
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: load the streamout buf descriptors closer to their useNicolai Hähnle2016-12-121-14/+11
| | | | | | LLVM can still decide to hoist the loads since they're marked invariant. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: extract writing of a single streamout outputNicolai Hähnle2016-12-121-39/+52
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: separate the call to si_llvm_emit_streamout from exportsNicolai Hähnle2016-12-121-4/+4
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: plumb the output vertex_stream through to si_shader_output_valuesNicolai Hähnle2016-12-121-1/+9
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: rename members of si_shader_output_valuesNicolai Hähnle2016-12-121-8/+8
| | | | | | Be a bit more verbose and avoid confusion in future patches. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: fix an off-by-one error in the bounds check for max_verticesNicolai Hähnle2016-12-121-1/+1
| | | | | | | | | | | The spec actually says that calling EmitStreamVertex is undefined when you exceed max_vertices. But we do need to avoid trampling over memory outside the GSVS ring. Cc: [email protected] Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: do not kill GS with memory writesNicolai Hähnle2016-12-121-8/+22
| | | | | | | | | | | Vertex emits beyond the specified maximum number of vertices are supposed to have no effect, which is why we used to always kill GS that reached the limit. However, if the GS also writes to memory (SSBO, atomics, shader images), then we must keep going and only skip the vertex emit itself. Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: update all GSVS ring descriptors for new buffer allocationsNicolai Hähnle2016-12-121-1/+6
| | | | | | | | Fixes GL45-CTS.gtf40.GL3Tests.transform_feedback3.transform_feedback3_geometry_instanced. Cc: [email protected] Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/glsl_to_tgsi: plumb the GS output stream qualifier through to TGSINicolai Hähnle2016-12-122-1/+21
| | | | | | Allow drivers to emit GS outputs in a smarter way. Reviewed-by: Marek Olšák <[email protected]>
* tgsi/scan: collect information about output usagemasksNicolai Hähnle2016-12-122-0/+2
| | | | Reviewed-by: Marek Olšák <[email protected]>
* tgsi/scan: collect information about output vertex streamsNicolai Hähnle2016-12-122-0/+19
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium: extract individual streamout output structureNicolai Hähnle2016-12-121-8/+13
| | | | | | So that we can pass pointers to individual array entries around. Reviewed-by: Marek Olšák <[email protected]>
* tgsi: add Stream{X,Y,Z,W} fields to tgsi_declaration_semanticNicolai Hähnle2016-12-124-3/+81
| | | | | | | | | | | This is for geometry shader outputs. Without it, drivers have no way of knowing which stream each output is intended for, and have to conservatively write all outputs to all streams. Separate stream numbers for each component are required due to output packing. Reviewed-by: Marek Olšák <[email protected]>
* virgl: Fix a strict-aliasing violation in the encoderEdward O'Callaghan2016-12-121-1/+7
| | | | | | | | | | | | | | | | As per the C spec, it is illegal to alias pointers to different types. This results in undefined behaviour after optimization passes, resulting in very subtle bugs that happen only on a full moon.. Use a memcpy() as a well defined coercion between the double to uint64_t interpretations of the memory. V.2: Use static_assert() instead of assert(). V.3: Use C99 compat STATIC_ASSERT() over C11 static_assert(). Signed-off-by: Edward O'Callaghan <[email protected]> Acked-by: Dave Airlie <[email protected]>
* softpipe: fix release build unused variable warningGrazvydas Ignotas2016-12-101-1/+1
| | | | | Signed-off-by: Grazvydas Ignotas <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* radeonsi: fix release build unused variable warningsGrazvydas Ignotas2016-12-102-2/+2
| | | | | Signed-off-by: Grazvydas Ignotas <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* swr: [rasterizer common/core/jitter] fetch support for GL_FIXEDTim Rowley2016-12-095-34/+188
| | | | | | v2: use fmul(1/65536) instead of fdiv(65535) Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core/memory] Finish R24_UNORM_X8_TYPELESS for AVX512Tim Rowley2016-12-092-26/+24
| | | | | | This one-off specialization was missed. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] supply proper clip distances to point spritesIlia Mirkin2016-12-081-3/+9
| | | | | | | | | | | | | | | Large points become pairs of triangles when rasterized, so we must feed it three clip distances, one for each vertex. The clip distance is not subject to sprite coord replacement, so there's no interpolation of it. We just take its value and put it in the "z" component of the barycentric-ready plane equation. (We could also just cull it at an earlier point in time, but that would require larger changes.) Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] perform perspective division on clip distancesIlia Mirkin2016-12-081-6/+8
| | | | | | | | Clip distances need to be perspective-divided. This fixes all the interpolation-*-{distance,vertex} piglits. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* radeonsi: disable the constant engine (CE) on Carrizo and StoneyMarek Olšák2016-12-081-1/+4
| | | | | | | | It must be disabled until the kernel bug is fixed, and then we'll enable CE based on the DRM version. Cc: 12.0 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: Fix typo: "llvm.fs.interp" => "llvm.SI.fs.interp"Michel Dänzer2016-12-081-1/+1
| | | | | | | | Fixes lots of pixel shaders failing to compile with LLVM 3.9 or older. Trivial. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99013#c4
* radeonsi: wait for outstanding LDS instructions in memory barriers if neededMarek Olšák2016-12-071-1/+17
| | | | | Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi: fix the src type of TGSI_OPCODE_MEMBARMarek Olšák2016-12-071-0/+1
| | | | | | | It's a literal integer. The next commit will need this. Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: wait for outstanding memory instructions in TCS barriersMarek Olšák2016-12-071-1/+5
| | | | | Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: allow specifying simm16 of emit_waitcnt at call sitesMarek Olšák2016-12-071-5/+7
| | | | | | | The next commit will use this. Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: write shader descriptors into hang reportsMarek Olšák2016-12-073-0/+117
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: check for sampler state CSO corruptionMarek Olšák2016-12-073-0/+17
| | | | | | | | It really happens. v2: declare "magic" in debug builds only Reviewed-by: Nicolai Hähnle <[email protected]> (v1)
* radeonsi: properly declare context sampler statesMarek Olšák2016-12-073-4/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix incorrect FMASK checking in bind_sampler_statesMarek Olšák2016-12-071-4/+4
| | | | | Cc: 12.0 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: always restore sampler states when unbinding sampler viewsMarek Olšák2016-12-071-3/+8
| | | | | Cc: 12.0 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: take LDS into account for compute shader occupancy statsMarek Olšák2016-12-071-11/+18
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: decrease the size of pipe_sampler_state fieldsMarek Olšák2016-12-071-3/+3
| | | | | | | | We've had unused bits. Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* cso: don't release sampler states that are boundMarek Olšák2016-12-071-1/+3
| | | | | | | | | | | | | | This fixes random radeonsi GPU hangs in Batman Arkham: Origins (Wine) and probably many other games too. cso_cache deletes sampler states when the cache size is too big and doesn't check which sampler states are bound, causing use-after-free in drivers. Because of that, radeonsi uploaded garbage sampler states and the hardware went bananas. Other drivers may have experienced similar issues. Cc: 12.0 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: fix isolines tess factor writes to control ringNicolai Hähnle2016-12-071-4/+12
| | | | | | Fixes piglit arb_tessellation_shader/execution/isoline{_no_tcs}.shader_test. Cc: [email protected]
* radeonsi: Use amdgcn intrinsics for fs interpolationTom Stellard2016-12-071-54/+142
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>