summaryrefslogtreecommitdiffstats
path: root/src/compiler
Commit message (Collapse)AuthorAgeFilesLines
* nir/lower_doubles: Inline functions directly in lower_doublesJason Ekstrand2019-03-062-17/+35
| | | | | | | | | | | | Instead of trusting the caller to already have created a softfp64 function shader and added all its functions to our shader, we simply take the softfp64 shader as an argument and do the function inlining ouselves. This means that there's no more nasty functions lying around that the caller needs to worry about cleaning up. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir/deref: Expose nir_opt_deref_implJason Ekstrand2019-03-062-1/+2
| | | | | | Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir/inline_functions: Break inlining into a builder helperJason Ekstrand2019-03-063-40/+60
| | | | | | | | | | | | | This pulls the guts of function inlining into a builder helper so that it can be used elsewhere. The rest of the infrastructure is still needed for most inlining cases to ensure that everything gets inlined and only ever once. However, there are use-cases where you just want to inline one little thing. This new helper also has a neat trick where it can seamlessly inline a function from one nir_shader into another. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl/nir: Inline functions in float64_funcs_to_nirJason Ekstrand2019-03-061-0/+5
| | | | | | | | | | This doesn't really change anything as the functions will all get inlined anyway. However it does let us do a bit of the work earlier and in a common place. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl/nir: Add a shared helper for building float64 shadersJason Ekstrand2019-03-064-0/+65
| | | | | | Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Teach loop unrolling about 64-bit instruction loweringJason Ekstrand2019-03-063-13/+79
| | | | | | | | | | | | | | | | The lowering we do for 64-bit instructions can cause a single NIR ALU instruction to blow up into hundreds or thousands of instructions potentially with control flow. If loop unrolling isn't aware of this, it can unroll a loop 20 times which contains a nir_op_fsqrt which we then lower to a full software implementation based on integer math. Those 20 invocations suddenly get a lot more expensive than NIR loop unrolling currently expects. By giving it an approximate estimate function, we can prevent loop unrolling from going to town when it shouldn't. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Expose double and int64 op_to_options_mask helpersJason Ekstrand2019-03-063-51/+23
| | | | | | | | | We already have one internally for int64 but we don't have a similar one for doubles so we'll have to make one. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* compiler/nir: add an is_conversion field to nir_op_infoIago Toral Quiroga2019-03-063-33/+47
| | | | | | | | | This is set to True only for numeric conversion opcodes. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: rename glsl_type_is_struct() -> glsl_type_is_struct_or_ifc()Timothy Arceri2019-03-0621-39/+39
| | | | | | | | | | Replace done using: find ./src -type f -exec sed -i -- \ 's/glsl_type_is_struct(/glsl_type_is_struct_or_ifc(/g' {} \; Acked-by: Karol Herbst <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* glsl: rename record_types -> struct_typesTimothy Arceri2019-03-062-10/+10
| | | | | | Acked-by: Karol Herbst <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* glsl: rename record_location_offset() -> struct_location_offset()Timothy Arceri2019-03-065-7/+7
| | | | | | | | | | Replace done using: find ./src -type f -exec sed -i -- \ 's/record_location_offset(/struct_location_offset(/g' {} \; Acked-by: Karol Herbst <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* glsl: rename get_record_instance() -> get_struct_instance()Timothy Arceri2019-03-065-7/+7
| | | | | | | | | | Replace done using: find ./src -type f -exec sed -i -- \ 's/get_record_instance(/get_struct_instance(/g' {} \; Acked-by: Karol Herbst <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* glsl: rename is_record() -> is_struct()Timothy Arceri2019-03-0618-80/+80
| | | | | | | | | | Replace was done using: find ./src -type f -exec sed -i -- \ 's/is_record(/is_struct(/g' {} \; Acked-by: Karol Herbst <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* nir/spirv: initial handling of OpenCL.std extension opcodesKarol Herbst2019-03-0512-3/+602
| | | | | | | | | | | | | | | | | | Not complete, mostly just adding things as I encounter them in CTS. But not getting far enough yet to hit most of the OpenCL.std instructions. Anyway, this is better than nothing and covers the most common builtins. v2: add hadd proof from Jason move some of the lowering into opt_algebraic and create new nir opcodes simplify nextafter lowering fix normalize lowering for inf rework upsample to use nir_pack_bits add missing files to build systems v3: split lines of iadd/sub_sat expressions Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/vtn: add support for SpvBuiltInGlobalLinearIdKarol Herbst2019-03-055-12/+45
| | | | | | | | v2: use formula with fewer operations Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: add support for address bit sized system valuesKarol Herbst2019-03-052-18/+29
| | | | | | | | | | | v2: add assert in else clause make local group intrinsics 32 bit wide v3: always use 32 bit constant for local_size v4: add comment by Jason Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/spirv: improve parsing of the memory modelKarol Herbst2019-03-053-7/+45
| | | | | | | v2: add some vtn_fail_ifs Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: replace magic numbers with M_PIKarol Herbst2019-03-051-2/+2
| | | | | | | we define it inside 'include/c99_math.h' so it is safe to use. Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add multiplier argument to nir_lower_uniforms_to_ubo.Timur Kristóf2019-03-052-9/+16
| | | | | | | | | | | | | Note that locations can be set in different units, and the multiplier argument caters to supporting these different units. For example, st_glsl_to_nir uses dwords (4 bytes) so the multiplier should be 4, while tgsi_to_nir uses bytes, so the multiplier should be 16. Signed-Off-By: Timur Kristóf <[email protected]> Tested-by: Andre Heider <[email protected]> Tested-by: Rob Clark <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: Move nir_lower_uniforms_to_ubo to compiler/nir.Timur Kristóf2019-03-054-0/+102
| | | | | | | | | | | | The nir_lower_uniforms_to_ubo function is useful outside of mesa/state_tracker, and in fact is needed to produce NIR for drivers that have the PIPE_CAP_PACKED_UNIFORMS capability. Signed-Off-By: Timur Kristóf <[email protected]> Tested-by: Andre Heider <[email protected]> Tested-by: Rob Clark <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: Add ability for shaders to use window space coordinates.Timur Kristóf2019-03-051-0/+3
| | | | | | | | | | | | | This patch adds a shader_info field that tells the driver to use window space coordinates for a given vertex shader. It also enables this feature in radeonsi (the only NIR-capable driver that supported it in TGSI), and makes tgsi_to_nir aware of it. Signed-Off-By: Timur Kristóf <[email protected]> Tested-by: Andre Heider <[email protected]> Tested-by: Rob Clark <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* v3d: Move the stores for fixed function VS output reads into NIR.Eric Anholt2019-03-051-0/+9
| | | | | | | | | | | | | | | This lets us emit the VPM_WRITEs directly from nir_intrinsic_store_output() (useful once NIR scheduling is in place so that we can reduce register pressure), and lets future NIR scheduling schedule the math to generate them. Even in the meantime, it looks like this lets NIR DCE some more code and make better decisions. total instructions in shared programs: 6429246 -> 6412976 (-0.25%) total threads in shared programs: 153924 -> 153934 (<.01%) total loops in shared programs: 486 -> 483 (-0.62%) total uniforms in shared programs: 2385436 -> 2388195 (0.12%) Acked-by: Ian Romanick <[email protected]> (nir)
* nir: Improve printing of load_input/store_output variable names.Eric Anholt2019-03-051-2/+4
| | | | | | | We were printing only when the channel was exactly the start channel, so scalarized loads/stores would be missing the name on the rest. Reviewed-by: Ian Romanick <[email protected]>
* spirv: Use the same types for resource indices as pointersJason Ekstrand2019-03-052-26/+42
| | | | | | | | We need more space than just a 32-bit scalar and we have to burn all that space anyway so we may as well expose it to the driver. This also fixes a subtle bug when UBOs and SSBOs have different pointer types. Reviewed-by: Lionel Landwerlin <[email protected]>
* spirv: Use the generic dereference function for OpArrayLengthJason Ekstrand2019-03-051-1/+1
| | | | | | | | With the new deref changes, the old pointer_offset version may not be the right one to call. Just call the generic one and let it sort it out. Reviewed-by: Lionel Landwerlin <[email protected]>
* spirv: Pull offset/stride from the pointer for OpArrayLengthJason Ekstrand2019-03-051-2/+10
| | | | | | | | | We can't pull it from the variable type because it might be an array of blocks and not just the one block. While we're here, throw in some error checking. Reviewed-by: Lionel Landwerlin <[email protected]> Cc: [email protected]
* intel,nir: Lower TXD with min_lod when the sampler index is not < 16Jason Ekstrand2019-03-042-0/+27
| | | | | | | | | | | When we have a larger sampler index, we get into the "high sampler" scenario and need an instruction header. Even in SIMD8, this pushes the instruction over the sampler message size maximum of 11 registers. Instead, we have to lower TXD to TXL. Fixes: cb98e0755f8d "intel/fs: Support min_lod parameters on texture..." Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* spirv: OpImageQueryLod requires a samplerJason Ekstrand2019-03-041-1/+1
| | | | | | | | | | No idea how this fell through the cracks besides the fact that the sampler bound at 0 almost always works and the CTS isn't amazing. In any case, this appears to have been broken for almost forever. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Cc: [email protected]
* spirv: Allow [i/u]mulExtended to use new nir opcodeSagar Ghuge2019-03-041-6/+10
| | | | | | | | | | Use new nir opcode nir_[i/u]mul_2x32_64 and extract lower and higher 32 bits as needed instead of emitting mul and mul_high. v2: Surround the switch case with curly braces (Jason Ekstrand) Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/algebraic: Optimize low 32 bit extractionSagar Ghuge2019-03-041-0/+2
| | | | | | | | | Optimize a situation where we only need lower 32 bits from 64 bit result. Signed-off-by: Sagar Ghuge <[email protected]> Suggested-by: Matt Turner <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* glsl: [u/i]mulExtended optimization for GLSLSagar Ghuge2019-03-045-4/+124
| | | | | | | | | | | | | | | Optimize mulExtended to use 32x32->64 multiplication. Drivers which are not based on NIR, they can set the MUL64_TO_MUL_AND_MUL_HIGH lowering flag in order to have same old behavior. v2: Add missing condition check (Jason Ekstrand) Signed-off-by: Sagar Ghuge <[email protected]> Suggested-by: Matt Turner <Matt Turner <[email protected]> Suggested-by: Jason Ekstrand <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/glsl: Add another way of doing lower_imul64 for gen8+Sagar Ghuge2019-03-044-6/+35
| | | | | | | | | | | | | | | | | | | On Gen 8 and 9, "mul" instruction supports 64 bit destination type. We can reduce our 64x64 int multiplication from 4 instructions to 3. Also instead of emitting two mul instructions, we can emit single mul instuction and extract low/high 32 bits from 64 bit result for [i/u]mulExtended v2: 1) Allow lower_mul_high64 to use new opcode (Jason Ekstrand) 2) Add lower_mul_2x32_64 flag (Matt Turner) 3) Remove associative property as bit size is different (Connor Abbott) v3: Fix indentation and variable naming convention (Jason Ekstrand) Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* glsl: fix recording of variables for XFB in TCS shadersIlia Mirkin2019-03-043-5/+44
| | | | | | | | | | | | | | | | | This is purely for conformance, since it's not actually possible to do XFB on TCS output varyings. However we do have to make sure we record the names correctly, and this removes an extra level of array-ness from the names in question. Fixes KHR-GL45.tessellation_shader.single.xfb_captures_data_from_correct_stage v2: Add comment to the new program_resource_visitor::process function. (Ilia Mirkin) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108457 Signed-off-by: Ilia Mirkin <[email protected]> Cc: 19.0 <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* glsl: TCS outputs can not be transform feedback candidates on GLESJose Maria Casanova Crespo2019-03-041-1/+21
| | | | | | | | | | | | | | | | | Avoids regression on: KHR-GLES*.core.tessellation_shader.single.xfb_captures_data_from_correct_stage that is uncovered by the following patch. "glsl: fix recording of variables for XFB in TCS shaders" v2: Rebased over glsl: fix recording of variables for XFB in TCS shaders v3: Move this patch before "glsl: fix recording of variables for XFB in TCS shaders" to avoid temporal regressions. (Illia Mirkin) Cc: 19.0 <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* glsl: fix typos in comments "transfor" -> "transform"Jose Maria Casanova Crespo2019-03-041-3/+3
| | | | | Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* mesa: Expose EXT_texture_query_lod and add support for its use shadersGert Wollny2019-03-033-1/+5
| | | | | | | | | | EXT_texture_query_lod provides the same functionality for GLES like the ARB extension with the same name for GL. v2: Set ES 3.0 as minimum GLES version as required by the extension Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* scons: Generate float64_glsl.h for glsl_to_nir fp64 loweringJordan Justen2019-03-021-0/+7
| | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add int64/doubles options into nir_shader_compiler_optionsJordan Justen2019-03-021-30/+33
| | | | | | | | | | | This will allow the options to be visible under nir_shader->options, which will allow the gallium state_tracker to use the driver preferred settings during glsl_to_nir. Suggested-by: Kenneth Graunke <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/algebraic: Optimize away an fsat of a b2fIan Romanick2019-03-021-0/+1
| | | | | | The b2f can only produce 0.0 or 1.0, so the fsat does nothing. Reviewed-by: Kenneth Graunke <[email protected]>
* nir/algebraic: Replace a-fract(a) with floor(a)Ian Romanick2019-03-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I noticed this while looking at a shader that was affected by Tim's "more loop unrolling" series. In review, Tim Arceri asked: > Why the hurt on Gen6+ is this something that should be in the late > optimisations pass? As far as I can tell, it's just because our scheduler is terrible. In all the fragment shaders that I looked at (some hurt shaders were from other stages), only one of the SIMD8 or SIMD16 version would be hurt. In many of those case, the other SIMD width is improved (e.g., shaders/closed/steam/brutal-legend/3990.shader_test). Often it looks like the scheduler decides to differently schedule a SEND the occurs somewhere early in the shader. Once that happens, everything is different. I looked at one vertex shader that was hurt (from Goat Simulator). In that case, both the floor and fract are used. The optimization eliminates the add, and it should allow better scheduling. In the area of the FRC and RNDD instructions, the scheduler does the right thing. However, later in the shader a MAD and and ADD get scheduled differently, and that makes it slightly worse. In light of this, I tried adding some "is_used_once" mark-up, and that did not fix all the cycles regressions. It also did a lot more harm than good on SKL (helped 82 vs. hurt 241). All Gen6+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15437001 -> 15435259 (-0.01%) instructions in affected programs: 213651 -> 211909 (-0.82%) helped: 988 HURT: 0 helped stats (abs) min: 1 max: 27 x̄: 1.76 x̃: 1 helped stats (rel) min: 0.15% max: 11.54% x̄: 1.14% x̃: 0.59% 95% mean confidence interval for instructions value: -1.89 -1.63 95% mean confidence interval for instructions %-change: -1.23% -1.05% Instructions are helped. total cycles in shared programs: 383007378 -> 382997063 (<.01%) cycles in affected programs: 1650825 -> 1640510 (-0.62%) helped: 679 HURT: 302 helped stats (abs) min: 1 max: 348 x̄: 23.39 x̃: 14 helped stats (rel) min: 0.04% max: 28.77% x̄: 1.61% x̃: 0.98% HURT stats (abs) min: 1 max: 250 x̄: 18.43 x̃: 7 HURT stats (rel) min: 0.04% max: 25.86% x̄: 1.41% x̃: 0.53% 95% mean confidence interval for cycles value: -13.05 -7.98 95% mean confidence interval for cycles %-change: -0.86% -0.50% Cycles are helped. Iron Lake and GM45 had similar results. (GM45 shown) total instructions in shared programs: 5043616 -> 5043010 (-0.01%) instructions in affected programs: 119691 -> 119085 (-0.51%) helped: 432 HURT: 0 helped stats (abs) min: 1 max: 27 x̄: 1.40 x̃: 1 helped stats (rel) min: 0.10% max: 8.11% x̄: 0.66% x̃: 0.39% 95% mean confidence interval for instructions value: -1.58 -1.23 95% mean confidence interval for instructions %-change: -0.72% -0.59% Instructions are helped. total cycles in shared programs: 128139812 -> 128135762 (<.01%) cycles in affected programs: 3829724 -> 3825674 (-0.11%) helped: 602 HURT: 0 helped stats (abs) min: 2 max: 486 x̄: 6.73 x̃: 6 helped stats (rel) min: 0.02% max: 4.85% x̄: 0.19% x̃: 0.10% 95% mean confidence interval for cycles value: -8.40 -5.05 95% mean confidence interval for cycles %-change: -0.22% -0.16% Cycles are helped. Reviewed-by: Elie Tournier <[email protected]>
* nir/algebraic: Replace a bcsel of a b2f sources with a b2f(!(a || b))Ian Romanick2019-03-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I have not investigated the result of doing this during code generation. That should be possible, but it would be a bit more effort. All Gen6+ platforms had nearly identical results. (Skylake shown) total cycles in shared programs: 370961508 -> 370961367 (<.01%) cycles in affected programs: 5174 -> 5033 (-2.73%) helped: 2 HURT: 0 Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8206587 -> 8206589 (<.01%) instructions in affected programs: 1325 -> 1327 (0.15%) helped: 0 HURT: 2 total cycles in shared programs: 187657422 -> 187657428 (<.01%) cycles in affected programs: 11566 -> 11572 (0.05%) helped: 0 HURT: 2 This change has almost no effect right now. However, removing this patch (but leaving the patch "intel/fs: Generate if instructions with inverted conditions") after adding a patch that removes !(a < b) -> (a >= b) optimizations (like https://patchwork.freedesktop.org/patch/264787/) has the following results on Skylake: Skylake total instructions in shared programs: 15071804 -> 15071806 (<.01%) instructions in affected programs: 640 -> 642 (0.31%) helped: 0 HURT: 2 total cycles in shared programs: 369914348 -> 369916569 (<.01%) cycles in affected programs: 27900 -> 30121 (7.96%) helped: 4 HURT: 15 helped stats (abs) min: 2 max: 112 x̄: 30.00 x̃: 3 helped stats (rel) min: 0.28% max: 12.28% x̄: 3.34% x̃: 0.40% HURT stats (abs) min: 2 max: 758 x̄: 156.07 x̃: 81 HURT stats (rel) min: 0.20% max: 74.30% x̄: 16.29% x̃: 16.91% 95% mean confidence interval for cycles value: 12.68 221.11 95% mean confidence interval for cycles %-change: 3.09% 21.23% Cycles are HURT. Reviewed-by: Kenneth Graunke <[email protected]>
* nir/algebraic: Replace i2b used by bcsel or if-statement with comparisonIan Romanick2019-03-012-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All of the helped shaders are in Deus Ex. I looked at a couple shaders, and they have a pattern like: vec1 32 ssa_373 = i2b32 ssa_345.w vec1 32 ssa_374 = bcsel ssa_373, ssa_20, ssa_0 ... vec1 32 ssa_377 = ine ssa_345.w, ssa_0 if ssa_377 { ... vec1 32 ssa_416 = i2b32 ssa_385.w vec1 32 ssa_417 = bcsel ssa_416, ssa_386, ssa_374 ... } The massive help occurs because the i2b32 is removed, then other passes determine that ssa_374 must be ssa_20 inside the if-statement allowing the first bcsel to also be deleted. v2: Rebase on 1-bit Boolean changes. v3: Fix i2b32 vs ine problem in if-statement replacement. Noticed by Bas. Skylake total instructions in shared programs: 15241394 -> 15186287 (-0.36%) instructions in affected programs: 890583 -> 835476 (-6.19%) helped: 355 HURT: 0 helped stats (abs) min: 1 max: 497 x̄: 155.23 x̃: 149 helped stats (rel) min: 0.09% max: 16.49% x̄: 6.10% x̃: 6.59% 95% mean confidence interval for instructions value: -165.07 -145.39 95% mean confidence interval for instructions %-change: -6.42% -5.77% Instructions are helped. total cycles in shared programs: 373846583 -> 371023357 (-0.76%) cycles in affected programs: 118972102 -> 116148876 (-2.37%) helped: 343 HURT: 14 helped stats (abs) min: 45 max: 118284 x̄: 8332.32 x̃: 6089 helped stats (rel) min: 0.03% max: 38.19% x̄: 2.48% x̃: 1.77% HURT stats (abs) min: 120 max: 4126 x̄: 2482.79 x̃: 3019 HURT stats (rel) min: 0.16% max: 17.37% x̄: 2.13% x̃: 1.11% 95% mean confidence interval for cycles value: -8723.28 -7093.12 95% mean confidence interval for cycles %-change: -2.57% -2.02% Cycles are helped. total spills in shared programs: 32401 -> 23465 (-27.58%) spills in affected programs: 24457 -> 15521 (-36.54%) helped: 343 HURT: 0 total fills in shared programs: 37866 -> 31765 (-16.11%) fills in affected programs: 18889 -> 12788 (-32.30%) helped: 343 HURT: 0 Broadwell and Haswell had similar results. (Haswell shown) Haswell total instructions in shared programs: 13764783 -> 13750679 (-0.10%) instructions in affected programs: 1176256 -> 1162152 (-1.20%) helped: 334 HURT: 21 helped stats (abs) min: 1 max: 358 x̄: 42.59 x̃: 47 helped stats (rel) min: 0.09% max: 11.81% x̄: 1.30% x̃: 1.37% HURT stats (abs) min: 1 max: 61 x̄: 5.76 x̃: 1 HURT stats (rel) min: 0.03% max: 1.84% x̄: 0.17% x̃: 0.03% 95% mean confidence interval for instructions value: -43.99 -35.47 95% mean confidence interval for instructions %-change: -1.35% -1.08% Instructions are helped. total cycles in shared programs: 386511910 -> 385402528 (-0.29%) cycles in affected programs: 143831110 -> 142721728 (-0.77%) helped: 327 HURT: 39 helped stats (abs) min: 16 max: 25219 x̄: 3519.74 x̃: 3570 helped stats (rel) min: <.01% max: 10.26% x̄: 0.95% x̃: 0.96% HURT stats (abs) min: 16 max: 4881 x̄: 1065.95 x̃: 997 HURT stats (rel) min: <.01% max: 16.67% x̄: 0.70% x̃: 0.24% 95% mean confidence interval for cycles value: -3375.59 -2686.60 95% mean confidence interval for cycles %-change: -0.92% -0.64% Cycles are helped. total spills in shared programs: 100480 -> 97846 (-2.62%) spills in affected programs: 84702 -> 82068 (-3.11%) helped: 316 HURT: 21 total fills in shared programs: 96877 -> 94369 (-2.59%) fills in affected programs: 69167 -> 66659 (-3.63%) helped: 316 HURT: 9 No changes on Ivy Bridge or earlier platforms. Reviewed-by: Kenneth Graunke <[email protected]>
* nir/copy_prop_vars: handle indirect vector elementsCaio Marcelo de Oliveira Filho2019-02-282-24/+25
| | | | | | | | | | | | | | | | | Differently than the direct case, the indirect array derefs of vector are handled like regular derefs, with the exception that we ignore any vector entry that has SSA values when performing a load. Such SSA values don't help loading of the indirect unless we emit an if-ladder. Copy_derefs are supported for indirects. Also enable two tests that now pass. v2: Remove unnecessary temporaries. Be clearer when identifying the case where copy_entry doesn't help when we are dealing with an indirect array_deref (of a vector). (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* nir/copy_prop_vars: prefer using entries from equal derefsCaio Marcelo de Oliveira Filho2019-02-281-4/+9
| | | | | | | | | | | When looking up an entry to use, always prefer an equal match, as it more likely to contain reusable SSA or derefs to propagate. This will be necessary when adding entries with array derefs of vectors, because we don't want the vector if the equal entry (an array deref of that vector) is present. Reviewed-by: Jason Ekstrand <[email protected]>
* nir/copy_prop_vars: add tests for indirect array derefCaio Marcelo de Oliveira Filho2019-02-281-7/+124
| | | | | | | | Both on an actual array and on a vector, and an extra test on a vector mixing direct and indirect access. The vector tests are disabled and will be enabled by a later commit. Reviewed-by: Jason Ekstrand <[email protected]>
* nir/copy_prop_vars: handle load/store of vector elementsCaio Marcelo de Oliveira Filho2019-02-282-34/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When direct array deref is used on a vector type (for loads and stores), copy_prop_vars is now smart to propagate values it knows about. Given a 'vec4 v', storing to v[3] will update the copy entry for v and it is equivalent to a write to v.w. Loading from v[1] will try first to see if there's a known value for v.y -- and drop the load in that case. The copy entries still always refer to the entire vectors, so the operations happen on the parent deref (the 'vector') and the values are fixed accordingly. It might be the case now that certain entries have not only different SSA defs in each element but also those come from different components than they are set to, because stores to individual elements always come from a SSA definition with a single component. Tests related to these cases are now enabled. v2: Instead of asserting on invalid indices, "load" an undef and remove the store. (Jason) v3: Merge code path for the cases of is_array_deref_of_vector into the regular code path. Add a base_index parameter to value_set_from_value. (code changes by Jason) v4: Removed the get_entry_for_deref helper, now being used only once. Reviewed-by: Jason Ekstrand <[email protected]>
* nir/copy_prop_vars: use NIR_MAX_VEC_COMPONENTSCaio Marcelo de Oliveira Filho2019-02-281-10/+22
| | | | | | | | | | | Also replace uses of 0xf with the appropriate full mask created from the number of components. Note that an increase of MAX might make us change how the data is stored later on, but for now at least we make sure the pass is not hardcoded. Reviewed-by: Jason Ekstrand <[email protected]>
* nir/copy_prop_vars: rename/refactor store_to_entry helperCaio Marcelo de Oliveira Filho2019-02-281-22/+20
| | | | | | | | | The name reflected this function role back when the pass also did dead write elimination. So rename it to what it does now, which is setting a value using another value; and narrow the argument list. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/spirv: return after emitting a branch in blockcros-mesa-19.0-r1-vanillachadv/cros-mesa-19.0-r1-vanillaJuan A. Suarez Romero2019-02-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | When emitting a branch in a block, it does not make sense to continue processing further instructions, as they will not be reachable. This fixes a nasty case with a loop with a branch that both then-part and else-part exits the loop: %1 = OpLabel OpLoopMerge %2 %3 None OpBranchConditional %false %2 %2 %3 = OpLabel OpBranch %1 %2 = OpLabel [...] We know that block %1 will branch always to block %2, which is the merge block for the loop. And thus a break is emitted. If we keep continuing processing further instructions, we will be processing the branch conditional and thus emitting the proper NIR conditional, which leads to instructions after the break. This fixes dEQP-VK.graphicsfuzz.continue-and-merge. CC: Jason Ekstrand <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* glsl: fix shader cache for packed param listTimothy Arceri2019-02-281-11/+4
| | | | | | | | | | | | | | | Some types of params such as some builtins are always padded. We need to keep track of this so we can restore the list correctly. Here we also remove a couple of cache entries that are not actually required as they get rebuilt by the _mesa_add_parameter() calls. This patch fixes a bunch of arb_texture_multisample and arb_sample_shading piglit tests for the radeonsi NIR backend. Fixes: edded1237607 ("mesa: rework ParameterList to allow packing") Reviewed-by: Marek Olšák <[email protected]>