aboutsummaryrefslogtreecommitdiffstats
path: root/src/freedreno/ir3
Commit message (Collapse)AuthorAgeFilesLines
* freedreno/ir3: fix for array/reg store vs meta instructionsRob Clark2019-07-291-1/+4
| | | | | | | | | | | | | | | fishgl.com has a shader which does roughly: foo = texture(...); if (bar) foo = texture(...); after lowering phi webs to regs we end up w/ a vec4 reg (array). But since it was not an indirect access, we try to skip the extra mov. This results that the per-component fanout (split) meta instructions store directly to the reg (array). Which doesn't work out in RA. Signed-off-by: Rob Clark <[email protected]>
* freedreno: Fix data race on making the shader's id.Eric Anholt2019-07-291-1/+2
| | | | | | | The value is only used for IR3_DBG_DISASM, but it cleans up the helgrind output. Reviewed-by: Rob Clark <[email protected]>
* freedreno: Take a lock around shader variant creation.Eric Anholt2019-07-292-0/+7
| | | | | | | | | Shaders are shared across contexts in gallium (part of making it so that you get shader compile at link time, for shader-db and to reduce compiles at draw time). So, we need to protect from variant creation for a shader from multiple threads at the same time. Reviewed-by: Rob Clark <[email protected]>
* freedreno: Fix data races with allocating/freeing struct ir3.Eric Anholt2019-07-291-1/+1
| | | | | | | | | | | | | | There is a single ir3_compiler in the screen, and each context may be compiling ir3 shaders, which call ir3_create. ralloc doesn't do any locking on its own, so eventually you can end up racing to break ralloc's linked lists. We really don't want struct ir3 to live as long as the compiler (maybe struct ir3_shader's lifetime, if anything), so you'd better be freeing it anyway. Fixes: 8fe20762433d ("freedreno/ir3: convert over to ralloc") Reviewed-by: Rob Clark <[email protected]>
* freedreno: Convert nir_lower_tg4_to_tex to the NIR lowering helper.Eric Anholt2019-07-181-88/+51
| | | | | | Cuts a bunch of boilerplate. Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: Convert load_barycentric_at_sample to the NIR lowering helper.Eric Anholt2019-07-181-48/+30
| | | | | | Cuts out a ton of boilerplate. Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: Convert load_barycentric_at_offset to the NIR lowering helper.Eric Anholt2019-07-181-39/+19
| | | | | | Cuts out a ton of boilerplate. Reviewed-by: Kristian H. Kristensen <[email protected]>
* nir: Add lower_rotate flag and set to true in all driversSagar Ghuge2019-07-011-0/+2
| | | | | | Signed-off-by: Sagar Ghuge <[email protected]> Suggested-by: Matt Turner <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* freedreno/ir3: small cleanupRob Clark2019-06-281-1/+1
| | | | | | `target` cannot be NULL here. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: fix missing (ss) in dummy bary.f caseRob Clark2019-06-281-0/+5
| | | | | | | | | | | | In case we need to insert a dummy bary.f for the (ei) flag, it also needs (ss) so we don't release varying storage to the next VS wave before the ldlv completed. Fixes random failures in: dEQP-GLES3.functional.transform_feedback.random.interleaved.lines.* Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: Only upload the used part of UBO0 to the constant buffer.Eric Anholt2019-06-242-5/+13
| | | | | | | | | | We were pessimistically uploading all of it in case of indirection, but we can just bump that when we encounter indirection. total constlen in shared programs: 2529623 -> 2485933 (-1.73%) Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno: Stop treating UBO 0 specially in UBO uploading.Eric Anholt2019-06-242-7/+0
| | | | | | | | | ir3_nir_analyze_ubo_ranges() has already told us how much of cb0 we need to upload (all of it, since it will lower indirect UBO 0 accesses from load_ubo back to indirection on the constant buffer). Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* nir: define behavior of nir_op_bfm and nir_op_u/ibfe according to SM5 spec.Daniel Schürmann2019-06-241-2/+0
| | | | | | | | | | | That is: the five least significant bits provide the values of 'bits' and 'offset' which is the case for all hardware currently supported by NIR and using the bfm/bfe instructions. This patch also changes the lowering of bitfield_insert/extract using shifts to not use bfm and removes the flag 'lower_bfm'. Tested-by: Eric Anholt <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* freedreno: Only upload UBO pointers for UBOs that haven't been lowered.Eric Anholt2019-06-211-1/+7
| | | | | | | total constlen in shared programs: 2485933 -> 2462236 (-0.95%) Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: Remove silly return from ir3_optimize_nir().Eric Anholt2019-06-214-12/+8
| | | | | | | | We only ever return the shader we were passed in (but internally modified). Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: Fix up end range of unaligned UBO loads.Eric Anholt2019-06-211-2/+3
| | | | | | | | | | We need the constants uploaded to cover the NIR offset plus the size, not the aligned-down start of our upload range plus the size. Fixes mistaken UBO analysis with mat3 loads. Fixes: 893425a607a6 ("freedreno/ir3: Push UBOs to constant file") Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno: Fix UBO load range detection on booleans.Eric Anholt2019-06-211-2/+1
| | | | | | | | NIR 1-bit bool dests will have a bit size of 1, and thus a calculated "bytes" of 0. load_ubo is always loading from dwords in the source. Fixes: 893425a607a6 ("freedreno/ir3: Push UBOs to constant file") Reviewed-by: Rob Clark <[email protected]>
* freedreno: Stop reporting max_const in shader-db.Eric Anholt2019-06-211-3/+1
| | | | | | | We end up uploading constlen regardless, so max_const would only get you slightly improved granularity in const usage in comparison. Reviewed-by: Rob Clark <[email protected]>
* freedreno: Include binning shaders in shader-db.Eric Anholt2019-06-211-1/+3
| | | | | | | We want to see if we've improved our binning VS output, as well as the render VS. Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: fix typoHyunjun Ko2019-06-201-1/+1
| | | | | Fixes: a9b556d3a04 ("freedreno/ir3: check the type of regs of absneg opcode in is_same_type_mov") Reviewed-by: Rob Clark <[email protected]>
* ir3: initialize progress false before ir3_nir_lower_imulTapani Pälli2019-06-141-1/+1
| | | | | | | | | Removes a compiler warning about uninitialized variable. Fixes: c02ffd2700c "ir3: Use the new NIR lowering pass for integer multiplication" Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Eduardo Lima <[email protected]>
* ir3: Use the new NIR lowering pass for integer multiplicationEduardo Lima Mitev2019-06-072-17/+16
| | | | | | | | | | | | | | | | | | | Shader-db stats courtesy of Eric Anholt: total instructions in shared programs: 6480215 -> 6475457 (-0.07%) instructions in affected programs: 662105 -> 657347 (-0.72%) helped: 1209 HURT: 13 total constlen in shared programs: 1432704 -> 1427769 (-0.34%) constlen in affected programs: 100063 -> 95128 (-4.93%) helped: 512 HURT: 0 total max_sun in shared programs: 875561 -> 873387 (-0.25%) max_sun in affected programs: 46179 -> 44005 (-4.71%) helped: 1087 HURT: 0 Reviewed-by: Eric Anholt <[email protected]>
* ir3/nir: Add new NIR AlgebraicPass for lowering imulEduardo Lima Mitev2019-06-073-1/+64
| | | | | | | | | | | | | | | | | | | | | | | | Currently, ir3 backend compiler is lowering integer multiplication from: dst = a * b to: dst = (al * bl) + (ah * bl << 16) + (al * bh << 16) by emitting this code: mull.u tmp0, a, b ; mul low, i.e. al * bl madsh.m16 tmp1, a, b, tmp0 ; mul-add shift high mix, i.e. ah * bl << 16 madsh.m16 dst, b, a, tmp1 ; i.e. al * bh << 16 which at that point has very low chances of being optimized. This patch adds a new nir_algebraic.AlgebraicPass to performs this lowering during NIR algebraic optimization passes, giving it a better chance for optimizing the resulting code. Reviewed-by: Eric Anholt <[email protected]>
* ir3/compiler: Handle new alu opcodes 'umul_low' and 'imadsh_mix16'Eduardo Lima Mitev2019-06-071-0/+6
| | | | | | They directly emit ir3_MULL_U and ir3_MADSH_M16 respectively. Reviewed-by: Eric Anholt <[email protected]>
* nir: Combine lower_fmod16/32 back into a single lower_fmod.Kenneth Graunke2019-06-051-2/+2
| | | | | | | | | | | | | | We originally had a single lower_fmod option. In commit 2ab2d2e5, Sam split 32 and 64-bit lowering into separate flags, with the rationale that some drivers might want different options there. This left 16-bit unhandled, so Iago added a lower_fmod16 option in commit ca31df6f. Now that lower_fmod64 is gone (in favor of nir_lower_doubles and nir_lower_dmod), we re-combine lower_fmod16 and lower_fmod32 into a single lower_fmod flag again. I'm not aware of any hardware which need lowering for one bitsize and not the other. Reviewed-by: Marek Olšák <[email protected]>
* gallium: Drop lower_fmod64 from drivers that don't support doubles.Kenneth Graunke2019-06-051-2/+0
| | | | | | | Neither freedreno nor nv50 expose PIPE_CAP_DOUBLES, so there's no fmod64 to be lowered. Reviewed-by: Marek Olšák <[email protected]>
* freedreno/ir3: Extend debug helpers to support TCS/TES/GSKristian H. Kristensen2019-06-053-7/+19
| | | | Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: Generalize ir3_shader_disasm()Kristian H. Kristensen2019-06-051-46/+42
| | | | | | | Use a helper function to get the sysval/attribute/varying/output name and make the disam debug output independent of shader stage. Reviewed-by: Rob Clark <[email protected]>
* freedreno: Reuse glsl_get_sampler_coordinate_components().Eric Anholt2019-06-041-25/+5
| | | | | | | | We have the GLSL type, so we can just ask it how many coordinates there are. The GLSL function already has Vulkan cases that we'd probably want eventually. Reviewed-by: Rob Clark <[email protected]>
* freedreno: Improve the pi approximations in trig lowering.Eric Anholt2019-06-041-2/+2
| | | | | | | | | | | | When comparing our sin/cos behavior to the closed source driver, I noticed that we were off by a bit (or, in the case of 1/2pi, 3 bits). Fixes: dEQP-GLES3.functional.shaders.random.trigonometric.vertex.52 dEQP-GLES3.functional.shaders.random.all_features.vertex.0 Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: fix counting and printing for half registers.Hyunjun Ko2019-06-032-7/+18
| | | | | v2: defining 0x100 and use this for setting the FS_OUTPUT_REG.HALF_PRECISION Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: Fix up the half reg source even when src instr==NULLNeil Roberts2019-06-031-3/+2
| | | | | | | | | | | | | | | | Previously the loop for assigning registers was bailing out early if the register had a null source. I think the intention is that in this case it isn’t necessary to assign a register. However it was also missing out the part to fix up the types. This can happen if the instruction is copy propagated to be a move from a constant half-float input register. In that case it still needs to fix up the types. Fixes assert in dEQP-GLES3.functional.shaders.invariance.highp.subexpression_precision_mediump when lowering the precision of the variables. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: Add a 16-bit implementation of nir_op_imulNeil Roberts2019-06-031-9/+15
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: set dst type of alu instructions correctly.Hyunjun Ko2019-06-031-5/+8
| | | | | | | | | | Though it should be fixed in RA pass, it needs to be set correctly from the beginning according to the bitsize of NIR dest. v2: Would be better for mad,fddx,fddy to fixup later in RA pass. [small cleanup of fallout from imov/fmov removal fallout] Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: adjust the bitsize of regs when an array loading.Hyunjun Ko2019-06-032-7/+16
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: convert back to 32-bit values for half constant registers.Hyunjun Ko2019-06-032-4/+54
| | | | | | | | | It seems to handle only 32-bit values for half constant registers within floating point opcodes according to the blob driver. So we need to convert back to 32-bit values from 16-bit values, when a lower precision pass is in effect. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: check the type of regs of absneg opcode in is_same_type_mov.Hyunjun Ko2019-06-031-0/+16
| | | | | | | | | | If the type of dest reg and src reg of absneg opcode are different, it shouldn't be considered as same type mov. This patch becomes meaningful when we start to use mediump information for doing precision lowering to 16bit. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: set proper dst type for uniform according to the type of nir ↵Hyunjun Ko2019-06-032-7/+14
| | | | | | | | | | | dest. eg. uniform mediump vec4 f; This patch means nothing since there's no mediump lowering pass for now, but will be meaningful when the pass land in the near future. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: Fix loading half-float immediate vectorsNeil Roberts2019-06-031-3/+12
| | | | | | | | | Previously the code to load from a constant instruction was always using the u32 pointer. If the constant is actually a 16-bit source this would end up with the wrong values because the pointer would be offset by the wrong size. This fixes it to use the u16 pointer. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: immediately schedule meta instructionsRob Clark2019-06-031-0/+3
| | | | | | | | The aren't real instructions, and don't change # of live values, so no point in them competing with real instructions. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/ir3: scheduler improvementsRob Clark2019-06-032-13/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For instructions that increase the # of live values, apply a threshold to avoid scheduling them too early. And factor the net change of # of live values that would result from scheduling an instruction, to prioritize instructions that reduce number of live values as the number of live values increases. For manhattan: total instructions in shared programs: 27869 -> 28413 (1.95%) instructions in affected programs: 26756 -> 27300 (2.03%) helped: 102 HURT: 87 total full in shared programs: 1903 -> 1719 (-9.67%) full in affected programs: 1390 -> 1206 (-13.24%) helped: 124 HURT: 9 The reduction in register usage nets ~20% gain in manhattan. (So getting mediump support should be a huge win for gles gfxbench.) Also significantly helps some of the more complex shadertoy shaders, like IQ's Piano (32 to 18 regs, doubles fps). The effect is less pronounced on smaller shaders. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/ir3: sched should mark outputs usedRob Clark2019-06-031-19/+35
| | | | | | | | Account for shader outputs and values live in any direct/indirect successor block. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/ir3: fix constlen versus indirect UBORob Clark2019-05-311-1/+8
| | | | | | | | | | | | | | If we access the address of the UBO indirectly, and there is no higher const emitted w/ direct access (like an immediate lowered to uniform) the assembler won't figure out the correct constlen. Fixes: dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_vertex dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_fragment dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_vertex dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_fragment Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: set more barrier bitsRob Clark2019-05-311-0/+1
| | | | | | | | | | | Blob is also setting the .l bit, and it seems to solve some intermittent failures with a couple of deqp's: dEQP-GLES31.functional.image_load_store.2d.qualifiers.coherent_r32i dEQP-GLES31.functional.image_load_store.2d.qualifiers.volatile_r32f Signed-off-by: Rob Clark <[email protected]> Acked-by: Eric Anholt <[email protected]>
* freedreno/ir3: set (ss) on last_input if ldlvRob Clark2019-05-311-3/+12
| | | | | | | | | | | | | | | It seems like (ei) handling doesn't sync on (ss), so we could end up in a situation where we release varying storage before an ldlv for flat shaded varyings completes. Keep track if we've done an (ss) since the last ldlv, and if not add (ss) flag to last_input which gets (ei). Noticed with dEQP-GLES3.functional.fragment_out.random.24 and dEQP-GLES3.functional.fragment_out.random.27, which previously passed by luck because ir3_sched ordered instructions in a way that resulted in a lucky (ss). Signed-off-by: Rob Clark <[email protected]> Acked-by: Eric Anholt <[email protected]>
* freedreno/ir3: add assertRob Clark2019-05-311-0/+2
| | | | | | | | | The special handling for last_input assumes that all the varying loads are in the first block. Add an assert to catch if anyone breaks that assumption. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/ir3: fix input ncomp for vertex shadersJonathan Marek2019-05-311-0/+1
| | | | | | | | | ncomp is never set for vertex shaders, but a3xx and a4xx still use it. Fixes: 831f1a05c0d freedreno/ir3: rework varying packing Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* nir: Drop imov/fmov in favor of one mov instructionJason Ekstrand2019-05-241-2/+2
| | | | | | | | | | | | | | | | The difference between imov and fmov has been a constant source of confusion in NIR for years. No one really knows why we have two or when to use one vs. the other. The real reason is that they do different things in the presence of source and destination modifiers. However, without modifiers (which many back-ends don't have), they are identical. Now that we've reworked nir_lower_to_source_mods to leave one abs/neg instruction in place rather than replacing them with imov or fmov instructions, we don't need two different instructions at all anymore. Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]> Acked-by: Rob Clark <[email protected]>
* freedreno: Log the number of loops in the shader for shader-db.Eric Anholt2019-05-162-0/+2
| | | | | | | | | | shader-db's report.py will use this to see when we've changed loop unrolling behavior on a shader and skip including other stats like instruction count from being considered for that shader, since they won't be useful as a proxy for real world performance in that case. Reviewed-by: Rob Clark <[email protected]> Tested-by: Eduardo Lima Mitev <[email protected]>
* nir: allow specifying a set of opcodes in lower_alu_to_scalarJonathan Marek2019-05-101-1/+1
| | | | | | | | | This can be used by both etnaviv and freedreno/a2xx as they are both vec4 architectures with some instructions being scalar-only. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]> Reviewed-by: Eric Anholt <[email protected]>