summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* nir/lower_tex: add lowering for texture gradient on shadow samplersIago Toral Quiroga2016-12-132-0/+67
| | | | | | | | | | | | | | | | | | | | This is ported from the Intel lowering pass that we use with GLSL IR. This takes care of lowering texture gradients on shadow samplers other than cube maps. Intel hardware requires this for gen < 8. v2 (Ken): - Use the helper function to retrieve ddx/ddy - Swizzle away size components we are not interested in v3: - Get rid of the ddx/ddy helper and use nir_tex_instr_src_index instead (Ken, Eric) v4: - Add a 'continue' statement if the lowering makes progress because it replaces the original texture instruction Reviewed-by: Kenneth Graunke <[email protected]> (v3)
* i965/nir: enable lowering of texture gradient for cube mapsIago Toral Quiroga2016-12-131-0/+1
| | | | | | | | | This gets the lowering on the Vulkan driver too. Fixes Vulkan CTS cube map texture gradient tests in: dEQP-VK.glsl.texture_functions.texturegrad.* Reviewed-by: Kenneth Graunke <[email protected]>
* nir/lower_tex: add lowering for texture gradient on cube mapsIago Toral Quiroga2016-12-132-0/+213
| | | | | | | | | | | | | | | | | | | | | | | | This is ported from the Intel lowering pass that we use with GLSL IR. The NIR pass only handles cube maps, not shadow samplers, which are also lowered for gen < 8 on Intel hardware. We will add support for that in a later patch, at which point we should be able to remove the GLSL IR lowering pass. v2: - added a helper to retrieve ddx/ddy parameters (Ken) - No need to make size.z=1.0, we are only using component x anyway (Iago) v3: - Get rid of the ddx/ddy helper and use nir_tex_instr_src_index instead (Ken, Eric) v4: - When emitting the textureLod operation, copy all texture parameters from the original textureGrad() (except for ddx/ddy) using a loop - Add a 'continue' statement if the lowering makes progress because it replaces the original texture instruction Reviewed-by: Kenneth Graunke <[email protected]> (v3)
* nir/lower_tex: generalize get_texture_size()Iago Toral Quiroga2016-12-131-5/+10
| | | | | | | This was written specifically for RECT samplers. Make it more generic so we can call this from the gradient lowerings too. Reviewed-by: Kenneth Graunke <[email protected]>
* treewide: s/comparitor/comparator/Ilia Mirkin2016-12-1231-85/+85
| | | | | | | | | | git grep -l comparitor | xargs sed -i 's/comparitor/comparator/g' Just happened to notice this in a patch that was sent and included one of the tokens in question. Signed-off-by: Ilia Mirkin <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* nir: Only float and double types can be matricesIan Romanick2016-12-122-19/+24
| | | | | | | | | | | | | | In 19a541f (nir: Get rid of nir_constant_data) a number of places that operated on nir_constant::values were mechanically converted to operate on the whole array without regard for the base type. Only GLSL_TYPE_FLOAT and GLSL_TYPE_DOUBLE can be matrices, so only those types can have data in the non-0 array element. See also b870394. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Cc: Iago Toral Quiroga <[email protected]>
* swr: [rasterizer core/memory] StoreTile: AVX512 progressTim Rowley2016-12-122-222/+138
| | | | | | Fixes to 128-bit formats. Reviwed-by: Bruce Cherniak <[email protected]>
* nir: Move fsat outside of fmin/fmax if second arg is 0 to 1.Matt Turner2016-12-122-0/+25
| | | | | | | | | | instructions in affected programs: 550 -> 544 (-1.09%) helped: 6 cycles in affected programs: 6952 -> 6850 (-1.47%) helped: 6 Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Reject copy propagation into SEL if not min/max.Matt Turner2016-12-122-1/+12
| | | | | | | | | | | | | | | | | | | | | We shouldn't ever see a SEL with conditional mod other than GE (for max) or L (for min), but we might see one with predication and no conditional mod. total instructions in shared programs: 8241806 -> 8241902 (0.00%) instructions in affected programs: 13284 -> 13380 (0.72%) HURT: 62 total cycles in shared programs: 84165104 -> 84166244 (0.00%) cycles in affected programs: 75364 -> 76504 (1.51%) helped: 10 HURT: 34 Fixes generated code in at least Sanctum 2, Borderlands 2, Goat Simulator, XCOM: Enemy Unknown, and Shogun 2. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92234 Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Add unit tests for copy propagation pass.Matt Turner2016-12-122-0/+211
| | | | | | Pretty basic, but it's a start. Acked-by: Jason Ekstrand <[email protected]>
* i965/fs: Rename opt_copy_propagate -> opt_copy_propagation.Matt Turner2016-12-123-15/+16
| | | | | | Matches the vec4 backend, cmod propagation, and saturate propagation. Reviewed-by: Jason Ekstrand <[email protected]>
* radeonsi: shrink the GSVS ring to account for the reduced item sizesNicolai Hähnle2016-12-121-1/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: shrink each vertex stream to the actually required sizeNicolai Hähnle2016-12-122-25/+40
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: use a single descriptor for the GSVS ringNicolai Hähnle2016-12-124-50/+67
| | | | | | | | | We can hardcode all of the fields for swizzling in the geometry shader. The advantage is that we use fewer descriptor slots and we no longer have to update any of the (ring) descriptors when the geometry shader changes. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: pack GS output components for each vertex stream contiguouslyNicolai Hähnle2016-12-121-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note that the memory layout of one vertex stream inside one "item" (= memory written by one GS wave) on the GSVS ring is: t0v0c0 ... t15v0c0 t0v1c0 ... t15v1c0 ... t0vLc0 ... t15vLc0 t0v0c1 ... t15v0c1 t0v1c1 ... t15v1c1 ... t0vLc1 ... t15vLc1 ... t0v0cL ... t15v0cL t0v1cL ... t15v1cL ... t0vLcL ... t15vLcL t16v0c0 ... t31v0c0 t16v1c0 ... t31v1c0 ... t16vLc0 ... t31vLc0 t16v0c1 ... t31v0c1 t16v1c1 ... t31v1c1 ... t16vLc1 ... t31vLc1 ... t16v0cL ... t31v0cL t16v1cL ... t31v1cL ... t16vLcL ... t31vLcL ... t48v0c0 ... t63v0c0 t48v1c0 ... t63v1c0 ... t48vLc0 ... t63vLc0 t48v0c1 ... t63v0c1 t48v1c1 ... t63v1c1 ... t48vLc1 ... t63vLc1 ... t48v0cL ... t63v0cL t48v1cL ... t63v1cL ... t48vLcL ... t63vLcL where tNN indicates the thread number, vNN the vertex number (in the order of EMIT_VERTEX), and cNN the output component (vL and cL are the last vertex and component, respectively). The vertex streams are laid out sequentially. The swizzling by 16 threads is hard-coded in the way the VGT generates the offset passed into the GS copy shader, and the jump every 16 threads is calculated from VGT_GSVS_RING_OFFSET_n and VGT_GSVS_RING_ITEMSIZE in a way that makes it difficult to deviate from this layout (at least that's what I've experimentally confirmed on VI after first trying to go the simpler route of just interleaving the vertex streams). Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: do not write non-existent components through the GSVS ringNicolai Hähnle2016-12-121-2/+4
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: only write values belonging to the stream when emitting GS vertexNicolai Hähnle2016-12-121-0/+3
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: generate an explicit switch instruction over vertex streamsNicolai Hähnle2016-12-121-8/+13
| | | | | | | | | | | | SimplifyCFG generates a switch instruction anyway when all four streams are present, but is simultaneously not smart enough to eliminate some redundant jumps that it generates. The generated assembly is still a bit silly, probably because the control flow annotation doesn't know how to handle a switch with uniform condition. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: fetch only outputs of current vertex stream from the GSVS ringNicolai Hähnle2016-12-121-16/+25
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: only export from GS copy shader for vertex stream 0Nicolai Hähnle2016-12-121-12/+19
| | | | | | | | When running the copy shader for vertex streams != 0, the SX does not need any data from us (there is no rasterization for the higher vertex streams, only streamout). Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: do not export VS outputs from vertex streams != 0Nicolai Hähnle2016-12-121-0/+6
| | | | | | | | | | | | This affects for GS copy shaders. When an output is meant for vertex stream != 0, then we don't have to make it available to the pixel shader. There is a minor inefficiency here because the GLSL varying packing pass does not group varyings of the same vertex stream together, but it shouldn't be important in practice. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: pull iteration over vertex streams into GS copy shader logicNicolai Hähnle2016-12-121-25/+37
| | | | | | The iteration is not needed for normal vertex shaders. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: group streamout writes by vertex streamNicolai Hähnle2016-12-121-10/+22
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: load the streamout buf descriptors closer to their useNicolai Hähnle2016-12-121-14/+11
| | | | | | LLVM can still decide to hoist the loads since they're marked invariant. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: extract writing of a single streamout outputNicolai Hähnle2016-12-121-39/+52
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: separate the call to si_llvm_emit_streamout from exportsNicolai Hähnle2016-12-121-4/+4
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: plumb the output vertex_stream through to si_shader_output_valuesNicolai Hähnle2016-12-121-1/+9
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: rename members of si_shader_output_valuesNicolai Hähnle2016-12-121-8/+8
| | | | | | Be a bit more verbose and avoid confusion in future patches. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: fix an off-by-one error in the bounds check for max_verticesNicolai Hähnle2016-12-121-1/+1
| | | | | | | | | | | The spec actually says that calling EmitStreamVertex is undefined when you exceed max_vertices. But we do need to avoid trampling over memory outside the GSVS ring. Cc: [email protected] Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: do not kill GS with memory writesNicolai Hähnle2016-12-121-8/+22
| | | | | | | | | | | Vertex emits beyond the specified maximum number of vertices are supposed to have no effect, which is why we used to always kill GS that reached the limit. However, if the GS also writes to memory (SSBO, atomics, shader images), then we must keep going and only skip the vertex emit itself. Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: update all GSVS ring descriptors for new buffer allocationsNicolai Hähnle2016-12-121-1/+6
| | | | | | | | Fixes GL45-CTS.gtf40.GL3Tests.transform_feedback3.transform_feedback3_geometry_instanced. Cc: [email protected] Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/glsl_to_tgsi: plumb the GS output stream qualifier through to TGSINicolai Hähnle2016-12-123-1/+31
| | | | | | Allow drivers to emit GS outputs in a smarter way. Reviewed-by: Marek Olšák <[email protected]>
* tgsi/scan: collect information about output usagemasksNicolai Hähnle2016-12-122-0/+2
| | | | Reviewed-by: Marek Olšák <[email protected]>
* tgsi/scan: collect information about output vertex streamsNicolai Hähnle2016-12-122-0/+19
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium: extract individual streamout output structureNicolai Hähnle2016-12-121-8/+13
| | | | | | So that we can pass pointers to individual array entries around. Reviewed-by: Marek Olšák <[email protected]>
* tgsi: add Stream{X,Y,Z,W} fields to tgsi_declaration_semanticNicolai Hähnle2016-12-124-3/+81
| | | | | | | | | | | This is for geometry shader outputs. Without it, drivers have no way of knowing which stream each output is intended for, and have to conservatively write all outputs to all streams. Separate stream numbers for each component are required due to output packing. Reviewed-by: Marek Olšák <[email protected]>
* glsl: remember per-component vertex streams for packed varyingsNicolai Hähnle2016-12-123-2/+24
| | | | Reviewed-by: Marek Olšák <[email protected]>
* i965/blorp: fix release build unused variable warningGrazvydas Ignotas2016-12-121-3/+1
| | | | | Signed-off-by: Grazvydas Ignotas <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* virgl: Fix a strict-aliasing violation in the encoderEdward O'Callaghan2016-12-121-1/+7
| | | | | | | | | | | | | | | | As per the C spec, it is illegal to alias pointers to different types. This results in undefined behaviour after optimization passes, resulting in very subtle bugs that happen only on a full moon.. Use a memcpy() as a well defined coercion between the double to uint64_t interpretations of the memory. V.2: Use static_assert() instead of assert(). V.3: Use C99 compat STATIC_ASSERT() over C11 static_assert(). Signed-off-by: Edward O'Callaghan <[email protected]> Acked-by: Dave Airlie <[email protected]>
* i965: Print out cycle estimates at the start of block annotations.Kenneth Graunke2016-12-111-1/+1
| | | | | | | | | | | | We now print START B15 <-B14 (42774 cycles) indicating that we estimate B15 will take 42,774 cycles. Printing this should make it easier where time is spent in the program. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* mesa: Return LINEAR encoding for winsys FBO depth/stencil.Kenneth Graunke2016-12-111-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | GetFramebufferAttachmentParameteriv should return GL_LINEAR for the window system default framebuffer's GL_DEPTH or GL_STENCIL attachments when there are zero depth or stencil bits. The GL 4.5 spec's GetFramebufferAttachmentParameteriv section says: "If the value of FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE is not NONE, these queries apply to all other framebuffer types: [...] If attachment is not a color attachment, or no data storage or texture image has been specified for the attachment, then params will contain the value LINEAR." Note that we already return LINEAR for the case where there is an actual depth or stencil renderbuffer attached. In the case modified by this patch, FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE returns FRAMEBUFFER_DEFAULT rather than NONE. Fixes a CTS test when run in a visual without depth / stencil buffers: GL45-CTS.gtf30.GL3Tests.framebuffer_srgb.framebuffer_srgb_default_encoding Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* intel/aubinator: fix 32bit shift overflow warningGrazvydas Ignotas2016-12-111-1/+1
| | | | | | | | Doesn't look like this can work on 32bit, just rids of annoying warning. Signed-off-by: Grazvydas Ignotas <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* anv: fix release build unused variable warningsGrazvydas Ignotas2016-12-112-2/+3
| | | | | Signed-off-by: Grazvydas Ignotas <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* radv/ac: some fix maybe-uninitialized warningsGrazvydas Ignotas2016-12-101-1/+4
| | | | | | | | Mark some paths unreachable so that compiler knows variables are initialized in all valid paths. Signed-off-by: Grazvydas Ignotas <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/meta: use VK_NULL_HANDLE for handlesGrazvydas Ignotas2016-12-103-4/+4
| | | | | | | | Otherwise we get 32bit warnings because handle is plain uint64_t there and NULL is not suited to initialize that. Signed-off-by: Grazvydas Ignotas <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix release build unused variable warningsGrazvydas Ignotas2016-12-102-19/+21
| | | | | | | Just mark with MAYBE_UNUSED. Signed-off-by: Grazvydas Ignotas <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* softpipe: fix release build unused variable warningGrazvydas Ignotas2016-12-101-1/+1
| | | | | Signed-off-by: Grazvydas Ignotas <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* radeonsi: fix release build unused variable warningsGrazvydas Ignotas2016-12-102-2/+2
| | | | | Signed-off-by: Grazvydas Ignotas <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* i965/mt: Disable HiZ when sharing depth buffer externally (v2)Chad Versace2016-12-101-7/+22
| | | | | | | | | | | | | | | | | intel_miptree_make_shareable() discarded and disabled CCS. Fix it so that it discards and disables HiZ too. Fixes dEQP-EGL.functional.image.render_multiple_contexts.gles2_renderbuffer_depth16_depth_buffer on Skylake. v2: Actually do what the commit message says. Discard the HiZ buffer. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=98329 Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: Nanley Chery <[email protected] Cc: Haixia Shi <[email protected]> Cc: [email protected]
* i965/mt: Disable aux surfaces after making miptree shareableChad Versace2016-12-101-0/+2
| | | | | | | | | | | | | | The entire goal of intel_miptree_make_shareable() is to permanently disable the miptree's aux surfaces. So set intel_mipmap_tree:disable_aux_buffers after the function's done with discarding down the aux surfaces. References: https://bugs.freedesktop.org/show_bug.cgi?id=98329 Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: Nanley Chery <[email protected] Cc: Haixia Shi <[email protected]> Cc: [email protected]