aboutsummaryrefslogtreecommitdiffstats
path: root/src/freedreno
Commit message (Collapse)AuthorAgeFilesLines
* freedreno/ir3: fix for array/reg store vs meta instructionsRob Clark2019-07-291-1/+4
| | | | | | | | | | | | | | | fishgl.com has a shader which does roughly: foo = texture(...); if (bar) foo = texture(...); after lowering phi webs to regs we end up w/ a vec4 reg (array). But since it was not an indirect access, we try to skip the extra mov. This results that the per-component fanout (split) meta instructions store directly to the reg (array). Which doesn't work out in RA. Signed-off-by: Rob Clark <[email protected]>
* freedreno: Fix data race on making the shader's id.Eric Anholt2019-07-291-1/+2
| | | | | | | The value is only used for IR3_DBG_DISASM, but it cleans up the helgrind output. Reviewed-by: Rob Clark <[email protected]>
* freedreno: Take a lock around shader variant creation.Eric Anholt2019-07-292-0/+7
| | | | | | | | | Shaders are shared across contexts in gallium (part of making it so that you get shader compile at link time, for shader-db and to reduce compiles at draw time). So, we need to protect from variant creation for a shader from multiple threads at the same time. Reviewed-by: Rob Clark <[email protected]>
* freedreno: Fix data races with allocating/freeing struct ir3.Eric Anholt2019-07-291-1/+1
| | | | | | | | | | | | | | There is a single ir3_compiler in the screen, and each context may be compiling ir3 shaders, which call ir3_create. ralloc doesn't do any locking on its own, so eventually you can end up racing to break ralloc's linked lists. We really don't want struct ir3 to live as long as the compiler (maybe struct ir3_shader's lifetime, if anything), so you'd better be freeing it anyway. Fixes: 8fe20762433d ("freedreno/ir3: convert over to ralloc") Reviewed-by: Rob Clark <[email protected]>
* anv+tu+radv: delete unusable dev_icd.jsonEric Engestrom2019-07-261-13/+0
| | | | | | | | | | | As per previous commit, Meson doesn't support using uninstalled libs, they're simply not ready until `ninja install` is ran, so delete them. Suggested-by: Jason Ekstrand <[email protected]> Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> # for anv Reviewed-by: Eric Anholt <[email protected]> # for tu Reviewed-by: Bas Nieuwenhuizen <[email protected]> # for radv
* freedreno: Add support for drm-shim.Eric Anholt2019-07-254-0/+224
| | | | | | I'm using this for shader-db analysis on x86_64 systems. Reviewed-by: Rob Clark <[email protected]>
* freedreno: Convert nir_lower_tg4_to_tex to the NIR lowering helper.Eric Anholt2019-07-181-88/+51
| | | | | | Cuts a bunch of boilerplate. Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: Convert load_barycentric_at_sample to the NIR lowering helper.Eric Anholt2019-07-181-48/+30
| | | | | | Cuts out a ton of boilerplate. Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: Convert load_barycentric_at_offset to the NIR lowering helper.Eric Anholt2019-07-181-39/+19
| | | | | | Cuts out a ton of boilerplate. Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: Generate headers from xml filesKristian H. Kristensen2019-07-1024-23892/+14106
| | | | | Reviewed-by: Eric Engestrom <[email protected]> Acked-by: Rob Clark <[email protected]>
* tu: add exported symbols checkEric Engestrom2019-07-101-0/+13
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* nir: Add lower_rotate flag and set to true in all driversSagar Ghuge2019-07-011-0/+2
| | | | | | Signed-off-by: Sagar Ghuge <[email protected]> Suggested-by: Matt Turner <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* freedreno: update generated registersRob Clark2019-07-017-16/+23
| | | | | | Corrects the a3xx texconst state for TILE_MODE. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: small cleanupRob Clark2019-06-281-1/+1
| | | | | | `target` cannot be NULL here. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: fix missing (ss) in dummy bary.f caseRob Clark2019-06-281-0/+5
| | | | | | | | | | | | In case we need to insert a dummy bary.f for the (ei) flag, it also needs (ss) so we don't release varying storage to the next VS wave before the ldlv completed. Fixes random failures in: dEQP-GLES3.functional.transform_feedback.random.interleaved.lines.* Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: Only upload the used part of UBO0 to the constant buffer.Eric Anholt2019-06-242-5/+13
| | | | | | | | | | We were pessimistically uploading all of it in case of indirection, but we can just bump that when we encounter indirection. total constlen in shared programs: 2529623 -> 2485933 (-1.73%) Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno: Stop treating UBO 0 specially in UBO uploading.Eric Anholt2019-06-242-7/+0
| | | | | | | | | ir3_nir_analyze_ubo_ranges() has already told us how much of cb0 we need to upload (all of it, since it will lower indirect UBO 0 accesses from load_ubo back to indirection on the constant buffer). Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* nir: define behavior of nir_op_bfm and nir_op_u/ibfe according to SM5 spec.Daniel Schürmann2019-06-241-2/+0
| | | | | | | | | | | That is: the five least significant bits provide the values of 'bits' and 'offset' which is the case for all hardware currently supported by NIR and using the bfm/bfe instructions. This patch also changes the lowering of bitfield_insert/extract using shifts to not use bfm and removes the flag 'lower_bfm'. Tested-by: Eric Anholt <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* freedreno: Only upload UBO pointers for UBOs that haven't been lowered.Eric Anholt2019-06-211-1/+7
| | | | | | | total constlen in shared programs: 2485933 -> 2462236 (-0.95%) Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: Remove silly return from ir3_optimize_nir().Eric Anholt2019-06-214-12/+8
| | | | | | | | We only ever return the shader we were passed in (but internally modified). Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: Fix up end range of unaligned UBO loads.Eric Anholt2019-06-211-2/+3
| | | | | | | | | | We need the constants uploaded to cover the NIR offset plus the size, not the aligned-down start of our upload range plus the size. Fixes mistaken UBO analysis with mat3 loads. Fixes: 893425a607a6 ("freedreno/ir3: Push UBOs to constant file") Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno: Fix UBO load range detection on booleans.Eric Anholt2019-06-211-2/+1
| | | | | | | | NIR 1-bit bool dests will have a bit size of 1, and thus a calculated "bytes" of 0. load_ubo is always loading from dwords in the source. Fixes: 893425a607a6 ("freedreno/ir3: Push UBOs to constant file") Reviewed-by: Rob Clark <[email protected]>
* freedreno: Stop reporting max_const in shader-db.Eric Anholt2019-06-211-3/+1
| | | | | | | We end up uploading constlen regardless, so max_const would only get you slightly improved granularity in const usage in comparison. Reviewed-by: Rob Clark <[email protected]>
* freedreno: Include binning shaders in shader-db.Eric Anholt2019-06-211-1/+3
| | | | | | | We want to see if we've improved our binning VS output, as well as the render VS. Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: fix typoHyunjun Ko2019-06-201-1/+1
| | | | | Fixes: a9b556d3a04 ("freedreno/ir3: check the type of regs of absneg opcode in is_same_type_mov") Reviewed-by: Rob Clark <[email protected]>
* ir3: initialize progress false before ir3_nir_lower_imulTapani Pälli2019-06-141-1/+1
| | | | | | | | | Removes a compiler warning about uninitialized variable. Fixes: c02ffd2700c "ir3: Use the new NIR lowering pass for integer multiplication" Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Eduardo Lima <[email protected]>
* freedreno: update generated headersRob Clark2019-06-117-53/+305
| | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* ir3: Use the new NIR lowering pass for integer multiplicationEduardo Lima Mitev2019-06-072-17/+16
| | | | | | | | | | | | | | | | | | | Shader-db stats courtesy of Eric Anholt: total instructions in shared programs: 6480215 -> 6475457 (-0.07%) instructions in affected programs: 662105 -> 657347 (-0.72%) helped: 1209 HURT: 13 total constlen in shared programs: 1432704 -> 1427769 (-0.34%) constlen in affected programs: 100063 -> 95128 (-4.93%) helped: 512 HURT: 0 total max_sun in shared programs: 875561 -> 873387 (-0.25%) max_sun in affected programs: 46179 -> 44005 (-4.71%) helped: 1087 HURT: 0 Reviewed-by: Eric Anholt <[email protected]>
* ir3/nir: Add new NIR AlgebraicPass for lowering imulEduardo Lima Mitev2019-06-073-1/+64
| | | | | | | | | | | | | | | | | | | | | | | | Currently, ir3 backend compiler is lowering integer multiplication from: dst = a * b to: dst = (al * bl) + (ah * bl << 16) + (al * bh << 16) by emitting this code: mull.u tmp0, a, b ; mul low, i.e. al * bl madsh.m16 tmp1, a, b, tmp0 ; mul-add shift high mix, i.e. ah * bl << 16 madsh.m16 dst, b, a, tmp1 ; i.e. al * bh << 16 which at that point has very low chances of being optimized. This patch adds a new nir_algebraic.AlgebraicPass to performs this lowering during NIR algebraic optimization passes, giving it a better chance for optimizing the resulting code. Reviewed-by: Eric Anholt <[email protected]>
* ir3/compiler: Handle new alu opcodes 'umul_low' and 'imadsh_mix16'Eduardo Lima Mitev2019-06-071-0/+6
| | | | | | They directly emit ir3_MULL_U and ir3_MADSH_M16 respectively. Reviewed-by: Eric Anholt <[email protected]>
* nir: Combine lower_fmod16/32 back into a single lower_fmod.Kenneth Graunke2019-06-051-2/+2
| | | | | | | | | | | | | | We originally had a single lower_fmod option. In commit 2ab2d2e5, Sam split 32 and 64-bit lowering into separate flags, with the rationale that some drivers might want different options there. This left 16-bit unhandled, so Iago added a lower_fmod16 option in commit ca31df6f. Now that lower_fmod64 is gone (in favor of nir_lower_doubles and nir_lower_dmod), we re-combine lower_fmod16 and lower_fmod32 into a single lower_fmod flag again. I'm not aware of any hardware which need lowering for one bitsize and not the other. Reviewed-by: Marek Olšák <[email protected]>
* gallium: Drop lower_fmod64 from drivers that don't support doubles.Kenneth Graunke2019-06-051-2/+0
| | | | | | | Neither freedreno nor nv50 expose PIPE_CAP_DOUBLES, so there's no fmod64 to be lowered. Reviewed-by: Marek Olšák <[email protected]>
* freedreno/ir3: Extend debug helpers to support TCS/TES/GSKristian H. Kristensen2019-06-053-7/+19
| | | | Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: Generalize ir3_shader_disasm()Kristian H. Kristensen2019-06-051-46/+42
| | | | | | | Use a helper function to get the sysval/attribute/varying/output name and make the disam debug output independent of shader stage. Reviewed-by: Rob Clark <[email protected]>
* freedreno: Reuse glsl_get_sampler_coordinate_components().Eric Anholt2019-06-041-25/+5
| | | | | | | | We have the GLSL type, so we can just ask it how many coordinates there are. The GLSL function already has Vulkan cases that we'd probably want eventually. Reviewed-by: Rob Clark <[email protected]>
* freedreno: Improve the pi approximations in trig lowering.Eric Anholt2019-06-041-2/+2
| | | | | | | | | | | | When comparing our sin/cos behavior to the closed source driver, I noticed that we were off by a bit (or, in the case of 1/2pi, 3 bits). Fixes: dEQP-GLES3.functional.shaders.random.trigonometric.vertex.52 dEQP-GLES3.functional.shaders.random.all_features.vertex.0 Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno: Fix GCC build error.Vinson Lee2019-06-031-1/+1
| | | | | | | | | | | ../src/freedreno/vulkan/tu_device.c:900:4: error: initializer element is not constant .minImageTransferGranularity = (VkExtent3D) { 1, 1, 1 }, ^ Suggested-by: Kristian Høgsberg <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110698 Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: fix counting and printing for half registers.Hyunjun Ko2019-06-032-7/+18
| | | | | v2: defining 0x100 and use this for setting the FS_OUTPUT_REG.HALF_PRECISION Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: Fix up the half reg source even when src instr==NULLNeil Roberts2019-06-031-3/+2
| | | | | | | | | | | | | | | | Previously the loop for assigning registers was bailing out early if the register had a null source. I think the intention is that in this case it isn’t necessary to assign a register. However it was also missing out the part to fix up the types. This can happen if the instruction is copy propagated to be a move from a constant half-float input register. In that case it still needs to fix up the types. Fixes assert in dEQP-GLES3.functional.shaders.invariance.highp.subexpression_precision_mediump when lowering the precision of the variables. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: Add a 16-bit implementation of nir_op_imulNeil Roberts2019-06-031-9/+15
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: set dst type of alu instructions correctly.Hyunjun Ko2019-06-031-5/+8
| | | | | | | | | | Though it should be fixed in RA pass, it needs to be set correctly from the beginning according to the bitsize of NIR dest. v2: Would be better for mad,fddx,fddy to fixup later in RA pass. [small cleanup of fallout from imov/fmov removal fallout] Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: adjust the bitsize of regs when an array loading.Hyunjun Ko2019-06-032-7/+16
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: convert back to 32-bit values for half constant registers.Hyunjun Ko2019-06-032-4/+54
| | | | | | | | | It seems to handle only 32-bit values for half constant registers within floating point opcodes according to the blob driver. So we need to convert back to 32-bit values from 16-bit values, when a lower precision pass is in effect. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: check the type of regs of absneg opcode in is_same_type_mov.Hyunjun Ko2019-06-031-0/+16
| | | | | | | | | | If the type of dest reg and src reg of absneg opcode are different, it shouldn't be considered as same type mov. This patch becomes meaningful when we start to use mediump information for doing precision lowering to 16bit. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: set proper dst type for uniform according to the type of nir ↵Hyunjun Ko2019-06-032-7/+14
| | | | | | | | | | | dest. eg. uniform mediump vec4 f; This patch means nothing since there's no mediump lowering pass for now, but will be meaningful when the pass land in the near future. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: Fix loading half-float immediate vectorsNeil Roberts2019-06-031-3/+12
| | | | | | | | | Previously the code to load from a constant instruction was always using the u32 pointer. If the constant is actually a 16-bit source this would end up with the wrong values because the pointer would be offset by the wrong size. This fixes it to use the u16 pointer. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: immediately schedule meta instructionsRob Clark2019-06-031-0/+3
| | | | | | | | The aren't real instructions, and don't change # of live values, so no point in them competing with real instructions. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/ir3: scheduler improvementsRob Clark2019-06-032-13/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For instructions that increase the # of live values, apply a threshold to avoid scheduling them too early. And factor the net change of # of live values that would result from scheduling an instruction, to prioritize instructions that reduce number of live values as the number of live values increases. For manhattan: total instructions in shared programs: 27869 -> 28413 (1.95%) instructions in affected programs: 26756 -> 27300 (2.03%) helped: 102 HURT: 87 total full in shared programs: 1903 -> 1719 (-9.67%) full in affected programs: 1390 -> 1206 (-13.24%) helped: 124 HURT: 9 The reduction in register usage nets ~20% gain in manhattan. (So getting mediump support should be a huge win for gles gfxbench.) Also significantly helps some of the more complex shadertoy shaders, like IQ's Piano (32 to 18 regs, doubles fps). The effect is less pronounced on smaller shaders. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/ir3: sched should mark outputs usedRob Clark2019-06-031-19/+35
| | | | | | | | Account for shader outputs and values live in any direct/indirect successor block. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/ir3: fix constlen versus indirect UBORob Clark2019-05-311-1/+8
| | | | | | | | | | | | | | If we access the address of the UBO indirectly, and there is no higher const emitted w/ direct access (like an immediate lowered to uniform) the assembler won't figure out the correct constlen. Fixes: dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_vertex dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_fragment dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_vertex dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_fragment Signed-off-by: Rob Clark <[email protected]>