summaryrefslogtreecommitdiffstats
path: root/src/compiler
Commit message (Collapse)AuthorAgeFilesLines
* mesa, glsl: add support for EXT_shader_image_load_formattedRhys Perry2019-04-153-0/+13
| | | | | | | | v3: rebase Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Marek Olšák <[email protected]> (v2) Signed-off-by: Marek Olšák <[email protected]>
* nir,ac/nir: fix cube_face_coordRhys Perry2019-04-151-6/+15
| | | | | | | | Seems it was missing the "/ ma + 0.5" and the order was swapped. Fixes: a1a2a8dfda7b9cac7e ('nir: add AMD_gcn_shader extended instructions') Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nir: fix packing components with arraysTimothy Arceri2019-04-151-1/+2
| | | | | | | | | | | | | When gathering info for unmovable types we need to handle arrays. While we dont support packing/moving arrays we do support packing scalar components with these arrays. Fixes piglit: tests/spec/arb_enhanced_layouts/execution/component-layout/vs-fs-array-interleave-range.shader_test Fixes: 5eb17506e159 ("nir: do not pack varying with different types") Reviewed-by: Samuel Pitoiset <[email protected]>
* spirv: add SpvCapabilityFloat16 supportSamuel Pitoiset2019-04-152-1/+5
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir/validate: Require unused bits of nir_const_value to be zeroJason Ekstrand2019-04-142-20/+41
| | | | Reviewed-by: Karol Herbst <[email protected]>
* nir/load_const_to_scalar: Get rid of a bit size switch statementJason Ekstrand2019-04-141-19/+1
| | | | | | | Now that nir_const_value is a scalar, we don't need the switch on bit size in order to pluck off components properly. Reviewed-by: Karol Herbst <[email protected]>
* spirv: Drop some unneeded bit size switch statementsJason Ekstrand2019-04-141-62/+4
| | | | | | | Now that nir_const_value is a scalar, we don't need the switch on bit size in order copy components around properly. Reviewed-by: Karol Herbst <[email protected]>
* nir/constant_folding: Get rid of a bit size switch statementJason Ekstrand2019-04-141-19/+1
| | | | | | | Now that nir_const_value is a scalar, we don't need the switch on bit size in order to swizzle them properly. Reviewed-by: Karol Herbst <[email protected]>
* nir: make nir_const_value scalarKarol Herbst2019-04-1427-353/+401
| | | | | | | | | v2: remove & operator in a couple of memsets add some memsets v3: fixup lima Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (v2)
* spirv: reduce array size in vtn_handle_constantKarol Herbst2019-04-141-1/+1
| | | | | | | | we already assert above that there are no more than 3 sources, so it doesn't make sense to use an array of 4 sources Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/loop_analyze: use nir_const_value.b for boolean results, not u32Karol Herbst2019-04-141-1/+1
| | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/print: Use nir_src_as_int for array indicesJason Ekstrand2019-04-141-3/+2
| | | | Reviewed-by: Karol Herbst <[email protected]>
* nir/builder: Add a nir_imm_zero helperJason Ekstrand2019-04-143-7/+17
| | | | | | v2: replace nir_zero_vec with nir_imm_zero (Karol Herbst) Reviewed-by: Karol Herbst <[email protected]>
* nir/builder: Move nir_imm_vec2 from blorp into the builderKarol Herbst2019-04-141-0/+12
| | | | | | | | While we're here, fix a typo which caused it to actually return a vec4 with the third and fourth components zero. Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add nir_lower_viewport_transformAlyssa Rosenzweig2019-04-144-0/+105
| | | | | | | | | | | | | | | | | | | | | | | | | On Mali hardware (supported by Panfrost and Lima), the fixed-function transformation from world-space to screen-space coordinates is done in the vertex shader prior to writing out the gl_Position varying, rather than in dedicated hardware. This commit adds a shared NIR pass for implementing coordinate transformation and lowering gl_Position writes into screen-space gl_Position writes. v2: Run directly on derefs before io/vars are lowered to cleanup the code substantially. Thank you to Qiang for this suggestion! v3: Bikeshed continues. v4: Add to Makefile.sources (per Jason's comment). Bikeshed comment. Ian and Qiang's reviews are from v3, but no real functional changes from v4. Rob's review is from v4. Signed-off-by: Alyssa Rosenzweig <[email protected]> Suggested-by: Qiang Yu <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* nir: add lower_ftruncChristian Gmeiner2019-04-132-0/+3
| | | | | | | Port TGSI TRUNC lowering to nir Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add a pass for selectively lowering variables to scratch spaceJason Ekstrand2019-04-129-1/+216
| | | | | | | | | | This commit adds new nir_load/store_scratch opcodes which read and write a virtual scratch space. It's up to the back-end to figure out what to do with it and where to put the actual scratch data. v2: Drop const_index comments (by anholt) Reviewed-by: Eric Anholt <[email protected]>
* nir: Add a comment about how intrinsic definitions work.Eric Anholt2019-04-121-0/+11
| | | | | | I was thinking about a refactor, and needed to read this first. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Drop remaining references to const_index in favor of the call to use.Eric Anholt2019-04-121-5/+5
| | | | | | Please don't make me read a const_index[] expression ever again. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Drop comments about the constant_index slots for load/stores.Eric Anholt2019-04-121-21/+15
| | | | | | | | | | The constant_index slots are named right there in the intrinsic definition, and the comment is just a chance to get out of sync. Noticed while reviewing the lower_to_scratch changes that copy-and-pasted wrong comments, and load_ubo and load_per_vertex_output had incorrect comments currently. Reviewed-by: Jason Ekstrand <[email protected]>
* glsl: Set location on structure-split sampler uniform variablesKenneth Graunke2019-04-121-0/+1
| | | | | | | | | | | | | gl_nir_lower_samplers_as_deref splits structure uniform variables, creating new variables for individual fields. As part of that, it calculates a new location. It then never set this on the new variables. Thanks to Michael Fiano for finding this bug. Fixes crashes on i965 with Piglit's new tests/spec/glsl-1.10/execution/samplers/uniform-struct test, which was reduced from the failing case in Michael's app. Fixes: f003859f97c nir: Make gl_nir_lower_samplers use gl_nir_lower_samplers_as_deref Reviewed-by: Timothy Arceri <[email protected]>
* glsl: allow the #extension directive within code blocks for the dri optionMarek Olšák2019-04-121-0/+9
| | | | | | for Viewperf 13 Acked-by: Timothy Arceri <[email protected]>
* glsl/nir: add support for lowering bindless images_derefsKarol Herbst2019-04-126-3/+105
| | | | | | | | | | | v2: handle atomics as well make use of nir_rewrite_image_intrinsic v3: remove call to nir_remove_dead_derefs v4: (Timothy Arceri) dont actually call lowering yet Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (v3) Reviewed-by: Marek Olšák <[email protected]>
* glsl/nir: fetch the type for images from the deref instructionKarol Herbst2019-04-121-3/+3
| | | | | | | | fixes retrieving the sampler type for bindless images stored inside structs. Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* glsl_to_nir: handle bindless texturesKarol Herbst2019-04-121-4/+15
| | | | | | | | | v2: add support for AMD Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (v1) Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nir/i965/freedreno/vc4: add a bindless bool to type size functionsTimothy Arceri2019-04-122-14/+22
| | | | | | | This required to calculate sizes correctly when we have bindless samplers/images. Reviewed-by: Marek Olšák <[email protected]>
* nir: move brw_nir_rewrite_image_intrinsic into common codeKarol Herbst2019-04-122-0/+44
| | | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nir: initialise some variables in opt_if_loop_last_continue()Timothy Arceri2019-04-111-2/+2
| | | | | | | | Fixes a couple of Coverity warnings CID 1444626. Fixes: e30804c6024f ("nir/radv: remove restrictions on opt_if_loop_last_continue()") Reviewed-by: Tapani Pälli <[email protected]>
* nir/xfb: do not use bare interface typeJuan A. Suarez Romero2019-04-111-1/+1
| | | | | | | | | | | | | | In commit 3b3653c4cfb we decided not to use bare types; hence do not use bare type when comparing with interface type to find out if the xfb variable is an array block. This fixes dEQP-VK.transform_feedback.* tests. Fixes: 3b3653c4cfb ("nir/spirv: don't use bare types, remove assert in split vars for testing") CC: Dave Airlie <[email protected]> CC: Jason Ekstrand <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* Revert "glsl: fix shader_storage_blocks_write_access for SSBO block arrays"Marek Olšák2019-04-101-6/+2
| | | | | | This reverts commit b7ca074cc0df6101c428b2dfa53a59a0c6620af2. It broke a lot of tests.
* glsl/standalone: add GLES3.1 and GLES3.2 compatibilityKarol Herbst2019-04-101-0/+5
| | | | | | | | | | also set some constants for SSBOs. With that it can compile the shader from: dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.18 Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* nir: Add access qualifiers on load_ubo intrinsic.Bas Nieuwenhuizen2019-04-102-2/+3
| | | | | | | | | Otherwise nir_lower_non_uniform_access crashes when it tries to get the access of a load_ubo. Fixes: 8ed583fe523 "spirv: Handle the NonUniformEXT decoration" Fixes: e50ab2c0f23 "nir: Add access flags to deref and SSBO atomics" Reviewed-by: Samuel Pitoiset <[email protected]>
* glsl: fix shader_storage_blocks_write_access for SSBO block arraysMarek Olšák2019-04-091-2/+6
| | | | | | | | CTS: GL45-CTS.compute_shader.resources-max Fixes: 4e1e8f684bf "glsl: remember which SSBOs are not read-only and pass it to gallium" Reviewed-by: Timothy Arceri <[email protected]>
* glsl/linker: location aliasing requires types to have the same widthAndres Gomez2019-04-091-39/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From the OpenGL 4.60.5 spec, section 4.4.1 Input Layout Qualifiers, Page 67, (Location aliasing): " Further, when location aliasing, the aliases sharing the location must have the same underlying numerical type and bit width (floating-point or integer, 32-bit versus 64-bit, etc.) and the same auxiliary storage and interpolation qualification." Additionally, we have improved the linker error descriptions. Specifically, when taking structs into account we were producing a linker error because we assumed that all components in each location were used and that would cause component aliasing. This is not accurate of the actual problem. Now, the failure specifies that the underlying numerical type incompatibility is the cause for the failure. Fixes the following piglit test: tests/spec/arb_enhanced_layouts/linker/component-layout/vs-to-fs-width-mismatch-double-float.shader_test v2: - Do not assert if we see invalid numerical types. These come straight from shader code, so we should produce linker errors if shaders attempt to do location aliasing on variables that are not numerical such as records. - While we are at it, improve error reporting for the case of numerical type mismatch to include the shader stage. v3: - Allow location aliasing of images and samplers. If we get these it means bindless support is active and they should be handled as 64-bit integers (Ilia) - Make sure we produce link errors for any non-numerical type for which we attempt location aliasing, not just structs. v4: - Rebased with minor fixes (Andres). - Added fixing tag to the commit log (Andres). v5: - Remove the helper function and check individually for the underlying numerical type and bit width (Timothy). - Implicitly, assume that any non-treated type which is checked for its underlying numerical type is either integer or float and has a defined bit width (Timothy). - Implicitly, assume that structs are the only non-treated non-numerical type (Timothy). - Improve the linker error descriptions and commit log (Andres). Fixes: 13652e7516a ("glsl/linker: Fix type checks for location aliasing") Cc: Ilia Mirkin <[email protected]> Cc: Timothy Arceri <[email protected]> Cc: Iago Toral Quiroga <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* nir: Get rid of global registersJason Ekstrand2019-04-0912-198/+10
| | | | | | | | | We have a pass to lower global registers to locals and many drivers dutifully call it. However, no one ever creates a global register ever so it's all dead code. It's time we bury it. Acked-by: Karol Herbst <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Get rid of nir_register::is_packedJason Ekstrand2019-04-097-34/+11
| | | | | | | | All we ever do is initialize it to zero, clone it, print it, and validate it. No one ever sets or uses it. Acked-by: Karol Herbst <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* spirv: Add support for DerivativeGroup capabilitiesCaio Marcelo de Oliveira Filho2019-04-082-0/+16
| | | | | | | | | | | | As defined in SPV_NV_compute_shader_derivatives. These control how the invocations are arranged in a CS when doing derivative and related operations (which are also enabled by the extension). Since we expect valid SPIR-V, we don't need to do more work at SPIR-V level to enable the derivative and related operations to be called. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Don't set LOD=0 for compute shader that has derivative groupCaio Marcelo de Oliveira Filho2019-04-081-2/+6
| | | | | | | | | | | When using NV_compute_shader_derivatives to set a derivative group, a compute shader supports texture with implicit LOD calculation, so don't set an explicit LOD. Note if the extension is used but the derivative group is not specified, it will default to LOD=0 as before. Reviewed-by: Jason Ekstrand <[email protected]>
* nir/algebraic: Lower CS derivatives to zero when no group definedCaio Marcelo de Oliveira Filho2019-04-082-0/+14
| | | | | | | | | | | In compute shaders if no derivative group is defined, the derivatives will always be zero. Specified in NV_compute_shader_derivatives. To make the check more convenient, add a "info" local variable to the generated code so we can refer to it in the Python rules. (Jason) Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* glsl: Parse and propagate derivative_group to shader_infoCaio Marcelo de Oliveira Filho2019-04-088-4/+187
| | | | | | | | | | | | NV_compute_shader_derivatives allow selecting between two possible arrangements (quads and linear) when calculating derivatives and certain subgroup operations in case of Vulkan. So parse and propagate those up to shader_info.h. v2: Do not fail when ARB_compute_variable_group_size is being used, since we are still clarifying what is the right thing to do here. Reviewed-by: Ian Romanick <[email protected]>
* glsl: Enable texture builtins for NV_compute_shader_derivativesCaio Marcelo de Oliveira Filho2019-04-081-140/+153
| | | | | | | Renamed a few predicates from "fs_only" to be "derivative_only" (or similar pairs). Reviewed-by: Ian Romanick <[email protected]>
* glsl: Enable derivative builtins for NV_compute_shader_derivativesCaio Marcelo de Oliveira Filho2019-04-081-9/+25
| | | | Reviewed-by: Ian Romanick <[email protected]>
* glsl: Remove redundant conditions when asserting in_qualifierCaio Marcelo de Oliveira Filho2019-04-081-5/+2
| | | | | | | As the code evolved, we ended up with a redundant conditions. Clean this up. Reviewed-by: Ian Romanick <[email protected]>
* mesa: Extension boilerplate for NV_compute_shader_derivativesCaio Marcelo de Oliveira Filho2019-04-082-0/+3
| | | | Reviewed-by: Ian Romanick <[email protected]>
* nir/radv: remove restrictions on opt_if_loop_last_continue()Timothy Arceri2019-04-092-35/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When I implemented opt_if_loop_last_continue() I had restricted this pass from moving other if-statements inside the branch opposite the continue. At the time it was causing a bunch of spilling in shader-db for i965. However Samuel Pitoiset noticed that making this pass more aggressive significantly improved the performance of Doom on RADV. Below are the statistics he gathered. 28717 shaders in 14931 tests Totals: SGPRS: 1267317 -> 1267549 (0.02 %) VGPRS: 896876 -> 895920 (-0.11 %) Spilled SGPRs: 24701 -> 26367 (6.74 %) Code Size: 48379452 -> 48507880 (0.27 %) bytes Max Waves: 241159 -> 241190 (0.01 %) Totals from affected shaders: SGPRS: 23584 -> 23816 (0.98 %) VGPRS: 25908 -> 24952 (-3.69 %) Spilled SGPRs: 503 -> 2169 (331.21 %) Code Size: 2471392 -> 2599820 (5.20 %) bytes Max Waves: 586 -> 617 (5.29 %) The codesize increases is related to Wolfenstein II it seems largely due to an increase in phis rather than the existing jumps. This gives +10% FPS with Doom on my Vega56. Rhys Perry also benchmarked Doom on his VEGA64: Before: 72.53 FPS After: 80.77 FPS v2: disable pass on non-AMD drivers Reviewed-by: Ian Romanick <[email protected]> (v1) Acked-by: Samuel Pitoiset <[email protected]>
* nir/search: Search for all combinations of commutative opsJason Ekstrand2019-04-083-29/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consider the following search expression and NIR sequence: ('iadd', ('imul', a, b), b) ssa_2 = imul ssa_0, ssa_1 ssa_3 = iadd ssa_2, ssa_0 The current algorithm is greedy and, the moment the imul finds a match, it commits those variable names and returns success. In the above example, it maps a -> ssa_0 and b -> ssa_1. When we then try to match the iadd, it sees that ssa_0 is not b and fails to match. The iadd match will attempt to flip itself and try again (which won't work) but it cannot ask the imul to try a flipped match. This commit instead counts the number of commutative ops in each expression and assigns an index to each. It then does a loop and loops over the full combinatorial matrix of commutative operations. In order to keep things sane, we limit it to at most 4 commutative operations (16 combinations). There is only one optimization in opt_algebraic that goes over this limit and it's the bitfieldReverse detection for some UE4 demo. Shader-db results on Kaby Lake: total instructions in shared programs: 15310125 -> 15302469 (-0.05%) instructions in affected programs: 1797123 -> 1789467 (-0.43%) helped: 6751 HURT: 2264 total cycles in shared programs: 357346617 -> 357202526 (-0.04%) cycles in affected programs: 15931005 -> 15786914 (-0.90%) helped: 6024 HURT: 3436 total loops in shared programs: 4360 -> 4360 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 23675 -> 23666 (-0.04%) spills in affected programs: 235 -> 226 (-3.83%) helped: 5 HURT: 1 total fills in shared programs: 32040 -> 32032 (-0.02%) fills in affected programs: 190 -> 182 (-4.21%) helped: 6 HURT: 2 LOST: 18 GAINED: 5 Reviewed-by: Thomas Helland <[email protected]>
* nir/algebraic: Add some logical OR and AND patternsJason Ekstrand2019-04-051-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new OR pattern has been seen in the wild and can end up being generated by GLSLang. Not sure about the other two new patterns but we may as well throw them in for completeness. While we're here, we can drop the '@bool' specifier from the one pattern because specifying True already implies 1-bit which basically implies boolean. Shader-db results on Kaby Lake: total instructions in shared programs: 15321227 -> 15321129 (<.01%) instructions in affected programs: 3594 -> 3496 (-2.73%) helped: 6 HURT: 0 total cycles in shared programs: 357481321 -> 357479725 (<.01%) cycles in affected programs: 44109 -> 42513 (-3.62%) helped: 6 HURT: 0 VkPipeline-DB results on Kaby Lake: total instructions in shared programs: 3770504 -> 3769734 (-0.02%) instructions in affected programs: 19058 -> 18288 (-4.04%) helped: 163 HURT: 0 total cycles in shared programs: 1417583701 -> 1417569727 (<.01%) cycles in affected programs: 750958 -> 736984 (-1.86%) helped: 158 HURT: 1 Reviewed-by: Timothy Arceri <[email protected]>
* nir/algebraic: Drop some @bool specifiersJason Ekstrand2019-04-051-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we have one-bit booleans, we don't need to rely on looking at parent instructions in order to figure out if a value is a Boolean most of the time. We can drop these specifiers and now the optimizations will apply more generally. Shader-DB results on Kaby Lake: total instructions in shared programs: 15321168 -> 15321227 (<.01%) instructions in affected programs: 8836 -> 8895 (0.67%) helped: 1 HURT: 31 total cycles in shared programs: 357481781 -> 357481321 (<.01%) cycles in affected programs: 146524 -> 146064 (-0.31%) helped: 22 HURT: 10 total spills in shared programs: 23675 -> 23673 (<.01%) spills in affected programs: 11 -> 9 (-18.18%) helped: 1 HURT: 0 total fills in shared programs: 32040 -> 32036 (-0.01%) fills in affected programs: 27 -> 23 (-14.81%) helped: 1 HURT: 0 No change in VkPipeline-DB Looking at the instructions hurt, a bunch of them seem to be a case where doing exactly the right thing in NIR ends up doing the wrong-ish thing in the back-end because flags are dumb. In particular, there's a case where we have a MUL followed by a CMP followed by a SEL and when we turn that SEL into an OR, it uses the GRF result of the CMP rather than the flag result so the CMP can't be merged with the MUL. Those shaders appear to schedule better according to the cycle estimates so I guess it's a win? Also it helps spilling in one Car Chase compute shader. Reviewed-by: Timothy Arceri <[email protected]>
* nir: Take if_uses into account when repairing SSACaio Marcelo de Oliveira Filho2019-04-051-0/+18
| | | | | | | | | | | | | | | If a def is used as an condition before its definition, we should also consider this a case to repair. When repairing, make sure we rewrite any if conditions too. Found in while inspecting a SPIR-V conversion from a 'continue block' that contains a conditional branch. We pull the continue block up to the beggining of the loop, and the condition in the branch ends up defined afterwards. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Fixes: 364212f1ede4b "nir: Add a pass to repair SSA form"
* nir: do not pack varying with different typesSamuel Pitoiset2019-04-051-0/+13
| | | | | | | | | The current algorithm only supports packing 32-bit types. If a shader uses both 16-bit and 32-bit varyings, we shouldn't compact them together. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>