aboutsummaryrefslogtreecommitdiffstats
path: root/src/freedreno/ir3
Commit message (Collapse)AuthorAgeFilesLines
* turnip: Gather information for transform feedbackHyunjun Ko2020-03-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | - Add one member to the existed ir3_stream_output so that we could assign location information from nir_xfb_info, rather than defining new struct. - Redefine maximum of so buffers, streams and outputs, which will be used for turnip. - Also enable caps for transform feedback for spirv_to_nir. v2. Remove redefined maximums and use IR3_MAX_SO_* and add IR3_MAX_SO_STREAMS. v3. Remove the newly added location field so that we could keep aligned with 32 bytes. Instead we create an array mapping between the location and consecutive index, which is GL driver is doing. Signed-off-by: Hyunjun Ko <[email protected]> Reviewed-by: Jonathan Marek <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3942>
* freedreno/ir3: try to avoid syncsRob Clark2020-03-101-1/+55
| | | | | | | | | | Update postsched to be better aware of where costly (ss) syncs would result. Sometimes it is better to allow a nop or two, to avoid a sync quickly after an SFU. Signed-off-by: Rob Clark <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071>
* freedreno/ir3: round-robin RARob Clark2020-03-101-4/+163
| | | | | | | | | | | | | | | | In the second (scalar pass) use the information about # of registers used in the first pass as the target max, and round-robin within that range. This generally gives the post-RA sched pass more opportunities to re-order instructions to remove nop's. Also, we can be a bit clever when assigning dest registers for SFU instructions, by picking the register used for it's src (if available and already assigned). This avoids some (ss) syncs caused by write after read hazards. (Ie. the SFU instruction will read it's own src before writing dest.) Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071>
* freedreno/ir3: track register usage in first RA passRob Clark2020-03-101-0/+41
| | | | | | | | We'll use the feedback from the first pass to select a target register usage in the second pass. Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071>
* freedreno/ir3: fix has_latency_to_hideRob Clark2020-03-101-1/+8
| | | | | | | | Also count tex-prefetch instructions. And only let the no-latency rule kick in for frag shaders. Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071>
* freedreno/ir3: split out has_latency_to_hide()Rob Clark2020-03-102-25/+25
| | | | | Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071>
* freedreno/ir3: add simplified stall estimationRob Clark2020-03-102-1/+14
| | | | | | | | | Doesn't take into account stalls that result from a register written in a different block, etc. But this should be more useful than just using number of (ss)'s by trying to estimate how costly a given sync is. Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071>
* freedreno/ir3: remove extra nops inserted in schedulerRob Clark2020-03-102-25/+0
| | | | | | | | | They were inserting a nop between back to back SFU instrucions. But that doesn't actually appear to be required. And they get stripped out later anyways before legalize. Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071>
* freedreno/ir3: also lower lowp frag outputsRob Clark2020-03-101-1/+2
| | | | | Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071>
* freedreno/ir3: Don't fold conversions into signKristian H. Kristensen2020-03-091-0/+1
| | | | | | Not supported. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3929>
* freedreno/ir3: add assertRob Clark2020-02-281-0/+1
| | | | | | | | Catch problems earlier when inputs are not setup correctly. Signed-off-by: Rob Clark <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
* freedreno/ir3: fix assert with getinfoRob Clark2020-02-281-2/+3
| | | | | | | | Fixes: dEQP-VK.glsl.texture_functions.query.texturesamples.sampler2dms_fixed_vertex Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
* freedreno/ir3: don't precolor unassigned inputsRob Clark2020-02-281-0/+3
| | | | | | | | Fixes crash seen in: dEQP-VK.glsl.conversions.matrix_to_matrix.mat4_to_mat3x4_vertex Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
* freedreno/ir3: fix crash with samgq workaroundRob Clark2020-02-281-1/+2
| | | | | | | | | Need to list_delinit() before we clone the instruction to split it into individual samgpN instructions, otherwise we get list corruption. Tested-by: Eduardo Lima Mitev <[email protected]> Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
* freedreno/ir3: update SFU delayRob Clark2020-02-284-13/+19
| | | | | | | | | | 1) emperically, 10 seems like a more accurate # than 4 2) push "soft" delay handling into ir3_delayslots(), as we should also be using it to calculate the costs that the schedulers use Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
* freedreno/ir3: track half-precision live valuesRob Clark2020-02-283-26/+43
| | | | | | | | | In schedule live value tracking, differentiate between half vs full precision. Half-precision live values are less costly than full precision. Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
* freedreno/ir3: don't hide latency when there is none to hideRob Clark2020-02-281-5/+52
| | | | | | | | | Current scheduler thresholds try to ensure there are warps available to switch to when hiding texture fetch latency. But if there is none to hide, we should allow scheduler to use more registers to reduce nops. Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
* freedreno/ir3: rewrite regmask to better support a6xx+Rob Clark2020-02-281-23/+53
| | | | | | | | | | | To avoid spurious sync flags, we want to, for a6xx+, operate in terms of half-regs, with a full precision register testing the corresponding two half-regs that it conflicts with. And while we are at it, stop open-coding BITSET Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
* freedreno/ir3: remove regmask_set_if_not()Rob Clark2020-02-281-21/+0
| | | | | | | No longer used. Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
* freedreno/ir3: remove from_tgsiRob Clark2020-02-281-3/+0
| | | | | | | | No longer used, other than in ir3 cmdline compiler, where it can be replaced with a local variable. Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
* freedreno/ir3: allow block->predecessors to be nullRob Clark2020-02-241-1/+4
| | | | | | | | | This way we can also use ir3_print from computerator, which mostly bypasses the ir3_block construct (since it doesn't need to do scheduling, etc) Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3926>
* freedreno/computerator: polish out some of the rustRob Clark2020-02-241-0/+3
| | | | | | | | | Updates for differences between fdre-a3xx's early version of ir3, and what we have now in mesa. And updates for instruction name and syntax changes. Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3926>
* freedreno: Switch to using lowered image intrinsics.Eric Anholt2020-02-246-137/+92
| | | | | | | | | This cuts out a bunch of deref chain walking that the compiler can do for us. Reviewed-by: Kenneth Graunke <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3728> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3728>
* freedreno/ir3: Fix the arg to ir3_get_num_components_for_image_format()Eric Anholt2020-02-242-2/+2
| | | | | | | | GLuint worked fine for storing our enum, but it should be an enum pipe_format since the image-formats merge. Reviewed-by: Kenneth Graunke <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3728>
* freedreno/ir3: Reuse glsl_get_sampler_dim_coordinate_components() in tex_info.Eric Anholt2020-02-241-21/+3
| | | | | | | | Now that we have access to the interior switch statement not going through the txs special case for coord_components, we can just use it. Reviewed-by: Kenneth Graunke <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3728>
* freedreno/ir3: Lower output precisionKristian H. Kristensen2020-02-243-0/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This lowers mediump FS outputs to fp16 in the ir3 backend. For now this is a modest improvement, which mostly helps us whittle down the full mediump work. Once the GLSL level support lands, then right hand side of the store output intrinsics will be fp16 expressions and we'll cancel out the fp16 -> fp32 -> fp 16 round trip here. We've had different attempts at implementing this: rewriting stores in the GLSL IR, lowering GLSL IR outputs to temporaries and inserting conversions when writing the temporaries to the outputs. In the end, GLSL ends up getting in the way a lot and doing it at the nir level is easier and still possible since we have the output var precisions. This part of the fp16 work is more of a step on the way towards full fp16 support and will add a few extra conversion instructions: total instructions in shared programs: 8151 -> 8163 (0.15%) instructions in affected programs: 1187 -> 1199 (1.01%) helped: 4 HURT: 10 total nops in shared programs: 3146 -> 3152 (0.19%) nops in affected programs: 563 -> 569 (1.07%) helped: 5 HURT: 10 total non-nops in shared programs: 5005 -> 5011 (0.12%) non-nops in affected programs: 92 -> 98 (6.52%) helped: 0 HURT: 3 total dwords in shared programs: 12832 -> 12800 (-0.25%) dwords in affected programs: 96 -> 64 (-33.33%) helped: 1 HURT: 0 total last-baryf in shared programs: 118 -> 115 (-2.54%) last-baryf in affected programs: 21 -> 18 (-14.29%) helped: 1 HURT: 0 total full in shared programs: 424 -> 417 (-1.65%) full in affected programs: 15 -> 8 (-46.67%) helped: 7 HURT: 0 Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3822> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3822>
* freedreno/ir3: handle half registers for arrays during register allocation.Hyunjun Ko2020-02-243-9/+38
| | | | | | | | So far we only handle full regs of arrays during pre-allocation. This patch is to handle half regs of arrays and also consider the size of half regs when finding out conflicts. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3822>
* freedreno/ir3: Add new ir3 pass to fold out fp16 conversionsHyunjun Ko2020-02-244-0/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This pass tries to fold f2f16 conversion into alu instructions. This will be useful to help reduce the number of instructions once mesa starts supporting precision lowering. For example: add.f r0.w, r0.w, c0.x cov.f32f16 hr2.x, r0.w to add.f hr2.x, r0.w, c0.x Additionally this pass also tries to fold f2f16 conversion into load_input instruction: bary.f r0.x, 3, r0.w cov.f32f16 hr0.x, r0.x to bary.f hr1.x, 3, r0.x v2: Edit to not fold OPC_MAX_F and OPC_MIN_F, since that's not valid. v3: Add OPC_ABSNEG_F to the blacklist as well. v4: Don't remove dead cov instructions, DCE will do that later; don't iterate through sources when a cov only has one; remove special handling of IR3_REG_ARRAY and IR3_REG_RELATIV. v5: Handle folding into u32.u32 movs of floats correctly, don't bail out on IR3_REG_RELATIV or IR3_REG_ARRAY movs. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3822>
* freedreno/ir3: Fold const only when the type is floatHyunjun Ko2020-02-071-0/+11
| | | | | Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3737> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3737>
* freedreno/ir3: put the conversion back for half const to the right place.Hyunjun Ko2020-02-071-6/+6
| | | | | | | | | | | | | The previous commit leads to match immed values unexpectedly. This makes constlen for each shader including bvert wrong. Also fixes atan2 for mediump deqp tests. Fixes: cbd1f47433b ("freedreno/ir3: convert back to 32-bit values for half constant registers.") v2: Move conversion up above fabs/fneg modifier handling as well. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3737>
* freedreno/ir3: Add cat4 mediump opcodesHyunjun Ko2020-02-072-0/+18
| | | | | | v2: Reworked to assign half-opcodes in ir3_ra.c (krh). Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3737>
* freedreno/ir3: fold const conversion into consumerRob Clark2020-02-072-1/+20
| | | | | | | | | | | | | | A sequence like: (nop3)cov.f32f16 hr0.x, c0.x mul.f hr4.y, hr1.z, hr0.x can be turned into: mul.f hr4.y, hr1.z, hc0.x Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3737>
* freedreno/ir3: fix printing half constant registers.Hyunjun Ko2020-02-071-3/+4
| | | | Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3737>
* freedreno/ir3: Set IR3_REG_HALF flag on src as well in immediate MOVKristian H. Kristensen2020-02-071-1/+1
| | | | | | | | This lets is_same_type_reg() recognize that the dst and src of the immediate MOV are the same and unblocks fp16 constant propagation. Signed-off-by: Kristian H. Kristensen <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3737>
* glsl,nir: Switch the enum representing shader image formats to PIPE_FORMAT.Eric Anholt2020-02-054-66/+7
| | | | | | | | | | | | | | | | | This means you can directly use format utils on it without having to have your own GL enum to number-of-components switch statement (or whatever) in your vulkan backend. Thanks to imirkin for fixing up the nouveau driver (and a couple of core details). This fixes the computed qualifiers for EXT_shader_image_load_store's non-integer sizeNxM qualifiers, which we don't have tests for. Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> (v3d) Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3355> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3355>
* freedreno/ir3: fix a dirty lieRob Clark2020-02-011-7/+4
| | | | | | | | | | | Lies, damn lies, and leftover hacks! We no longer hard-code these two, so fix the disasm to print the correct values. Signed-off-by: Rob Clark <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
* freedreno/ir3: simplify split from collectRob Clark2020-02-011-0/+10
| | | | | | | | | | | In some cases we need to split components out from what was already a collect. That was making it hard to DCE unused components of the collect. (Ie. unused components of fragcoord, etc) So just detect this case and skip the chained collect+split. Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
* freedreno/ir3: create fragcoord instructions in input blockRob Clark2020-02-011-2/+2
| | | | | | | | | | | | | This was somehow working to create the instructions in a random block, and use the value in other blocks, by dumb luck. But two-pass-RA's better choice of register assignment causes a couple dEQPs to start failing without this fix: dEQP-GLES3.functional.shaders.metamorphic.bubblesort_flag.variant_1 dEQP-GLES3.functional.shaders.metamorphic.bubblesort_flag.variant_2 Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
* freedreno/ir3: remove unused tex arg harderRob Clark2020-02-013-19/+12
| | | | | | | | Just killing the SSA link isn't enough. It confuses RA, legalize, and postsched to see a bogus unused reg. Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
* freedreno/ir3: add RA sanity checkRob Clark2020-02-011-0/+33
| | | | | Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
* freedreno/ir3: two pass register allocationRob Clark2020-02-012-60/+297
| | | | | Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
* freedreno/ir3: don't precolor unused inputsRob Clark2020-02-011-1/+2
| | | | | | | | This apparently can happen with gs/tess. And will cause problems with two-pass-ra, so lets just skip them. Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
* freedreno/ir3: add is_tex_or_prefetch()Rob Clark2020-02-013-2/+7
| | | | | | | | | | | | Some of the aspects of tex prefetch are in common with normal tex instructions, such as having a wrmask to control which components are written. Add a helper for this. This should result in actually using the prefetch wrmask to avoid fetching unneeded components. Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
* freedreno/ir3: number instructions from oneRob Clark2020-02-011-1/+1
| | | | | | | | ra_block_compute_live_ranges() treats zero as "not yet defined", so probably best to not let this be a valid instruction # Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
* freedreno/ir3: post-RA sched passRob Clark2020-02-015-5/+678
| | | | | | | | | | | | | After RA, we can schedule to increase parallelism (reduce nop's) without worrying about increasing register pressure. This pass lets us cut down the instruction count ~10%, and prioritize bary.f, kill, etc, which would tend to increase register pressure if we tried to do that before RA. It should be more useful if RA round-robin'd register choices. Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
* freedreno/ir3: fix kill schedulingRob Clark2020-02-012-1/+2
| | | | | | | | | | | | | | kill (and other cat0/flow instructions) do not have a dst register. Which was mostly harmless before, other than RA thinking it would need a free register to write. (But nothing consumed it, so the value would be immediately dead.) But this would cause more problems with postsched which would see a bogus dependency. Also, post-RA sched *does* need to see the dependency on the predicate register. Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
* freedreno/ir3/ra: make use()/def() functions instead of macrosRob Clark2020-02-011-15/+24
| | | | | | | | | | | | | Originally these were nested functions, which worked nicely, giving us the function of a local macro that was actual 'c' syntax (ie. not token pasted macro). But these were converted to macros because clang doesn't let us have nice gcc extensions. Extract these back out into functions, before adding more things and making the macros even more cumbersome. Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
* freedreno/ir3: a bit more optmsgs debugRob Clark2020-02-011-0/+10
| | | | | | | Also dump where arrays are allocated. This was useful for debugging. Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
* freedreno/ir3: move atomic fixup after RARob Clark2020-02-013-28/+38
| | | | | | | | A post-RA sched pass will move the extra mov's to the wrong place, so rework the fixup so it can run after RA (and therefore after postsched) Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
* freedreno/ir3: move block-scheduling into legalizeRob Clark2020-02-014-49/+45
| | | | | | | | | We want to do this only once. If we have post-RA sched pass, then we don't want to do it pre-RA. Since legalize is where we resolve the branch/jumps, we might as well move this into legalize. Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>