summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* treewide: s/comparitor/comparator/Ilia Mirkin2016-12-1231-85/+85
| | | | | | | | | | git grep -l comparitor | xargs sed -i 's/comparitor/comparator/g' Just happened to notice this in a patch that was sent and included one of the tokens in question. Signed-off-by: Ilia Mirkin <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* nir: Only float and double types can be matricesIan Romanick2016-12-122-19/+24
| | | | | | | | | | | | | | In 19a541f (nir: Get rid of nir_constant_data) a number of places that operated on nir_constant::values were mechanically converted to operate on the whole array without regard for the base type. Only GLSL_TYPE_FLOAT and GLSL_TYPE_DOUBLE can be matrices, so only those types can have data in the non-0 array element. See also b870394. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Cc: Iago Toral Quiroga <[email protected]>
* swr: [rasterizer core/memory] StoreTile: AVX512 progressTim Rowley2016-12-122-222/+138
| | | | | | Fixes to 128-bit formats. Reviwed-by: Bruce Cherniak <[email protected]>
* nir: Move fsat outside of fmin/fmax if second arg is 0 to 1.Matt Turner2016-12-122-0/+25
| | | | | | | | | | instructions in affected programs: 550 -> 544 (-1.09%) helped: 6 cycles in affected programs: 6952 -> 6850 (-1.47%) helped: 6 Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Reject copy propagation into SEL if not min/max.Matt Turner2016-12-122-1/+12
| | | | | | | | | | | | | | | | | | | | | We shouldn't ever see a SEL with conditional mod other than GE (for max) or L (for min), but we might see one with predication and no conditional mod. total instructions in shared programs: 8241806 -> 8241902 (0.00%) instructions in affected programs: 13284 -> 13380 (0.72%) HURT: 62 total cycles in shared programs: 84165104 -> 84166244 (0.00%) cycles in affected programs: 75364 -> 76504 (1.51%) helped: 10 HURT: 34 Fixes generated code in at least Sanctum 2, Borderlands 2, Goat Simulator, XCOM: Enemy Unknown, and Shogun 2. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92234 Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Add unit tests for copy propagation pass.Matt Turner2016-12-122-0/+211
| | | | | | Pretty basic, but it's a start. Acked-by: Jason Ekstrand <[email protected]>
* i965/fs: Rename opt_copy_propagate -> opt_copy_propagation.Matt Turner2016-12-123-15/+16
| | | | | | Matches the vec4 backend, cmod propagation, and saturate propagation. Reviewed-by: Jason Ekstrand <[email protected]>
* radeonsi: shrink the GSVS ring to account for the reduced item sizesNicolai Hähnle2016-12-121-1/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: shrink each vertex stream to the actually required sizeNicolai Hähnle2016-12-122-25/+40
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: use a single descriptor for the GSVS ringNicolai Hähnle2016-12-124-50/+67
| | | | | | | | | We can hardcode all of the fields for swizzling in the geometry shader. The advantage is that we use fewer descriptor slots and we no longer have to update any of the (ring) descriptors when the geometry shader changes. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: pack GS output components for each vertex stream contiguouslyNicolai Hähnle2016-12-121-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note that the memory layout of one vertex stream inside one "item" (= memory written by one GS wave) on the GSVS ring is: t0v0c0 ... t15v0c0 t0v1c0 ... t15v1c0 ... t0vLc0 ... t15vLc0 t0v0c1 ... t15v0c1 t0v1c1 ... t15v1c1 ... t0vLc1 ... t15vLc1 ... t0v0cL ... t15v0cL t0v1cL ... t15v1cL ... t0vLcL ... t15vLcL t16v0c0 ... t31v0c0 t16v1c0 ... t31v1c0 ... t16vLc0 ... t31vLc0 t16v0c1 ... t31v0c1 t16v1c1 ... t31v1c1 ... t16vLc1 ... t31vLc1 ... t16v0cL ... t31v0cL t16v1cL ... t31v1cL ... t16vLcL ... t31vLcL ... t48v0c0 ... t63v0c0 t48v1c0 ... t63v1c0 ... t48vLc0 ... t63vLc0 t48v0c1 ... t63v0c1 t48v1c1 ... t63v1c1 ... t48vLc1 ... t63vLc1 ... t48v0cL ... t63v0cL t48v1cL ... t63v1cL ... t48vLcL ... t63vLcL where tNN indicates the thread number, vNN the vertex number (in the order of EMIT_VERTEX), and cNN the output component (vL and cL are the last vertex and component, respectively). The vertex streams are laid out sequentially. The swizzling by 16 threads is hard-coded in the way the VGT generates the offset passed into the GS copy shader, and the jump every 16 threads is calculated from VGT_GSVS_RING_OFFSET_n and VGT_GSVS_RING_ITEMSIZE in a way that makes it difficult to deviate from this layout (at least that's what I've experimentally confirmed on VI after first trying to go the simpler route of just interleaving the vertex streams). Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: do not write non-existent components through the GSVS ringNicolai Hähnle2016-12-121-2/+4
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: only write values belonging to the stream when emitting GS vertexNicolai Hähnle2016-12-121-0/+3
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: generate an explicit switch instruction over vertex streamsNicolai Hähnle2016-12-121-8/+13
| | | | | | | | | | | | SimplifyCFG generates a switch instruction anyway when all four streams are present, but is simultaneously not smart enough to eliminate some redundant jumps that it generates. The generated assembly is still a bit silly, probably because the control flow annotation doesn't know how to handle a switch with uniform condition. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: fetch only outputs of current vertex stream from the GSVS ringNicolai Hähnle2016-12-121-16/+25
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: only export from GS copy shader for vertex stream 0Nicolai Hähnle2016-12-121-12/+19
| | | | | | | | When running the copy shader for vertex streams != 0, the SX does not need any data from us (there is no rasterization for the higher vertex streams, only streamout). Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: do not export VS outputs from vertex streams != 0Nicolai Hähnle2016-12-121-0/+6
| | | | | | | | | | | | This affects for GS copy shaders. When an output is meant for vertex stream != 0, then we don't have to make it available to the pixel shader. There is a minor inefficiency here because the GLSL varying packing pass does not group varyings of the same vertex stream together, but it shouldn't be important in practice. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: pull iteration over vertex streams into GS copy shader logicNicolai Hähnle2016-12-121-25/+37
| | | | | | The iteration is not needed for normal vertex shaders. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: group streamout writes by vertex streamNicolai Hähnle2016-12-121-10/+22
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: load the streamout buf descriptors closer to their useNicolai Hähnle2016-12-121-14/+11
| | | | | | LLVM can still decide to hoist the loads since they're marked invariant. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: extract writing of a single streamout outputNicolai Hähnle2016-12-121-39/+52
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: separate the call to si_llvm_emit_streamout from exportsNicolai Hähnle2016-12-121-4/+4
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: plumb the output vertex_stream through to si_shader_output_valuesNicolai Hähnle2016-12-121-1/+9
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: rename members of si_shader_output_valuesNicolai Hähnle2016-12-121-8/+8
| | | | | | Be a bit more verbose and avoid confusion in future patches. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: fix an off-by-one error in the bounds check for max_verticesNicolai Hähnle2016-12-121-1/+1
| | | | | | | | | | | The spec actually says that calling EmitStreamVertex is undefined when you exceed max_vertices. But we do need to avoid trampling over memory outside the GSVS ring. Cc: [email protected] Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: do not kill GS with memory writesNicolai Hähnle2016-12-121-8/+22
| | | | | | | | | | | Vertex emits beyond the specified maximum number of vertices are supposed to have no effect, which is why we used to always kill GS that reached the limit. However, if the GS also writes to memory (SSBO, atomics, shader images), then we must keep going and only skip the vertex emit itself. Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: update all GSVS ring descriptors for new buffer allocationsNicolai Hähnle2016-12-121-1/+6
| | | | | | | | Fixes GL45-CTS.gtf40.GL3Tests.transform_feedback3.transform_feedback3_geometry_instanced. Cc: [email protected] Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/glsl_to_tgsi: plumb the GS output stream qualifier through to TGSINicolai Hähnle2016-12-123-1/+31
| | | | | | Allow drivers to emit GS outputs in a smarter way. Reviewed-by: Marek Olšák <[email protected]>
* tgsi/scan: collect information about output usagemasksNicolai Hähnle2016-12-122-0/+2
| | | | Reviewed-by: Marek Olšák <[email protected]>
* tgsi/scan: collect information about output vertex streamsNicolai Hähnle2016-12-122-0/+19
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium: extract individual streamout output structureNicolai Hähnle2016-12-121-8/+13
| | | | | | So that we can pass pointers to individual array entries around. Reviewed-by: Marek Olšák <[email protected]>
* tgsi: add Stream{X,Y,Z,W} fields to tgsi_declaration_semanticNicolai Hähnle2016-12-124-3/+81
| | | | | | | | | | | This is for geometry shader outputs. Without it, drivers have no way of knowing which stream each output is intended for, and have to conservatively write all outputs to all streams. Separate stream numbers for each component are required due to output packing. Reviewed-by: Marek Olšák <[email protected]>
* glsl: remember per-component vertex streams for packed varyingsNicolai Hähnle2016-12-123-2/+24
| | | | Reviewed-by: Marek Olšák <[email protected]>
* i965/blorp: fix release build unused variable warningGrazvydas Ignotas2016-12-121-3/+1
| | | | | Signed-off-by: Grazvydas Ignotas <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* virgl: Fix a strict-aliasing violation in the encoderEdward O'Callaghan2016-12-121-1/+7
| | | | | | | | | | | | | | | | As per the C spec, it is illegal to alias pointers to different types. This results in undefined behaviour after optimization passes, resulting in very subtle bugs that happen only on a full moon.. Use a memcpy() as a well defined coercion between the double to uint64_t interpretations of the memory. V.2: Use static_assert() instead of assert(). V.3: Use C99 compat STATIC_ASSERT() over C11 static_assert(). Signed-off-by: Edward O'Callaghan <[email protected]> Acked-by: Dave Airlie <[email protected]>
* i965: Print out cycle estimates at the start of block annotations.Kenneth Graunke2016-12-111-1/+1
| | | | | | | | | | | | We now print START B15 <-B14 (42774 cycles) indicating that we estimate B15 will take 42,774 cycles. Printing this should make it easier where time is spent in the program. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* mesa: Return LINEAR encoding for winsys FBO depth/stencil.Kenneth Graunke2016-12-111-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | GetFramebufferAttachmentParameteriv should return GL_LINEAR for the window system default framebuffer's GL_DEPTH or GL_STENCIL attachments when there are zero depth or stencil bits. The GL 4.5 spec's GetFramebufferAttachmentParameteriv section says: "If the value of FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE is not NONE, these queries apply to all other framebuffer types: [...] If attachment is not a color attachment, or no data storage or texture image has been specified for the attachment, then params will contain the value LINEAR." Note that we already return LINEAR for the case where there is an actual depth or stencil renderbuffer attached. In the case modified by this patch, FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE returns FRAMEBUFFER_DEFAULT rather than NONE. Fixes a CTS test when run in a visual without depth / stencil buffers: GL45-CTS.gtf30.GL3Tests.framebuffer_srgb.framebuffer_srgb_default_encoding Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* intel/aubinator: fix 32bit shift overflow warningGrazvydas Ignotas2016-12-111-1/+1
| | | | | | | | Doesn't look like this can work on 32bit, just rids of annoying warning. Signed-off-by: Grazvydas Ignotas <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* anv: fix release build unused variable warningsGrazvydas Ignotas2016-12-112-2/+3
| | | | | Signed-off-by: Grazvydas Ignotas <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* radv/ac: some fix maybe-uninitialized warningsGrazvydas Ignotas2016-12-101-1/+4
| | | | | | | | Mark some paths unreachable so that compiler knows variables are initialized in all valid paths. Signed-off-by: Grazvydas Ignotas <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/meta: use VK_NULL_HANDLE for handlesGrazvydas Ignotas2016-12-103-4/+4
| | | | | | | | Otherwise we get 32bit warnings because handle is plain uint64_t there and NULL is not suited to initialize that. Signed-off-by: Grazvydas Ignotas <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix release build unused variable warningsGrazvydas Ignotas2016-12-102-19/+21
| | | | | | | Just mark with MAYBE_UNUSED. Signed-off-by: Grazvydas Ignotas <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* softpipe: fix release build unused variable warningGrazvydas Ignotas2016-12-101-1/+1
| | | | | Signed-off-by: Grazvydas Ignotas <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* radeonsi: fix release build unused variable warningsGrazvydas Ignotas2016-12-102-2/+2
| | | | | Signed-off-by: Grazvydas Ignotas <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* i965/mt: Disable HiZ when sharing depth buffer externally (v2)Chad Versace2016-12-101-7/+22
| | | | | | | | | | | | | | | | | intel_miptree_make_shareable() discarded and disabled CCS. Fix it so that it discards and disables HiZ too. Fixes dEQP-EGL.functional.image.render_multiple_contexts.gles2_renderbuffer_depth16_depth_buffer on Skylake. v2: Actually do what the commit message says. Discard the HiZ buffer. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=98329 Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: Nanley Chery <[email protected] Cc: Haixia Shi <[email protected]> Cc: [email protected]
* i965/mt: Disable aux surfaces after making miptree shareableChad Versace2016-12-101-0/+2
| | | | | | | | | | | | | | The entire goal of intel_miptree_make_shareable() is to permanently disable the miptree's aux surfaces. So set intel_mipmap_tree:disable_aux_buffers after the function's done with discarding down the aux surfaces. References: https://bugs.freedesktop.org/show_bug.cgi?id=98329 Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: Nanley Chery <[email protected] Cc: Haixia Shi <[email protected]> Cc: [email protected]
* spirv: Use a simpler and more correct implementaiton of tanh()Jason Ekstrand2016-12-091-9/+14
| | | | | | | | | | The new implementation is more correct because it clamps the incoming value to 10 to avoid floating-point overflow. It also uses a much reduced version of the formula which only requires 1 exp() rather than 2. This fixes all of the dEQP-VK.glsl.builtin.precision.tanh.* tests. Reviewed-by: Kenneth Graunke <[email protected]> Cc: "13.0" <[email protected]>
* glsl: Use a simpler formula for tanhJason Ekstrand2016-12-091-8/+10
| | | | | | | | | | | | The formula we have used in the past is a trivial reduction from the definition by simply multiplying both the numerator and denominator of the formula by 2. However, multiplying by e^x, you can further reduce it. This allows us to get rid of one side of the clamp and two of exponential functions which should make it faster. The new formula still passes the dEQP precision tests for tanh so it should be fine. Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* anv: Clean up some unused variablesEdward O'Callaghan2016-12-101-15/+0
| | | | | | | Following on from the spirit of commit 011e5570f. Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* swr: [rasterizer common/core/jitter] fetch support for GL_FIXEDTim Rowley2016-12-095-34/+188
| | | | | | v2: use fmul(1/65536) instead of fdiv(65535) Reviewed-by: Bruce Cherniak <[email protected]>