summaryrefslogtreecommitdiffstats
path: root/src/freedreno
Commit message (Collapse)AuthorAgeFilesLines
* freedreno/ir3: also track # of nops for shader-dbRob Clark2019-11-092-0/+4
| | | | | | | | | | | | | The instruction count is (mostly) a measure of what optimization passes can do, while # of nops is more an indication of how effectively the scheduler is balancing register pressure vs instruction count. So track these independently. (There could be opportunities to rematerialize values to reduce register pressure, swapping some nop's with other alu instructions, so nothing is truely independent.. but it is still useful to break these stats out.) Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: sync disasm changes from envytoolsRob Clark2019-11-092-24/+94
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: remove obsolete commentRob Clark2019-11-091-4/+0
| | | | | | | | The meta PHI instruction was removed long ago. And fanin/fanout themselves to not contribute actual instructions (at least not by the time you get to sched, they may prevent copy-propagating away a mov) Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3/ra: remove ir print after livein/outRob Clark2019-11-091-1/+0
| | | | | | | The IR hasn't changed at this point, so it isn't really adding any value. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3/ra: move regs_count==0 checkRob Clark2019-11-091-9/+2
| | | | | | | | Fold it in to writes_gpr() (since a register that does not reference any registers by definition does not write a register). This lets us avoid having to handle this case in a few other places. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: ir3_print tweaksRob Clark2019-11-092-47/+102
| | | | | | Handle HALF/HIGH flags in all cases, and colorize SSA src notation. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: use SSA flag on dest register tooRob Clark2019-11-094-45/+48
| | | | | | | | | | | | We did this in some places before, but not consistantly. But it will be useful for two-pass RA, to identify which registers have already been assigned. While we are cleaning this up, use __ssa_src() and new __ssa_dst() helper more consistently. (If nothing else, this reduces the # of callers of ir3_reg_create() to audit that we didn't miss something) Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: split pre-coloring to it's own functionRob Clark2019-11-091-3/+12
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: Use regid() helper when setting up precolor regsKristian H. Kristensen2019-11-071-4/+4
| | | | | Signed-off-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/a6xx: Program state for tessellation stagesKristian H. Kristensen2019-11-071-0/+5
| | | | | | Signed-off-by: Kristian H. Kristensen <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: Allocate const space for tessellation parametersKristian H. Kristensen2019-11-071-0/+7
| | | | | | | | | | The tessellation stages need size and stride or the patch layout as well as locations of attributes in the patch. The tesselation stages also use two system memory BOs and need the iovas of those. Signed-off-by: Kristian H. Kristensen <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: Pre-color TCS header and primitive ID inputsKristian H. Kristensen2019-11-071-2/+12
| | | | | | | | | | Similar to GS, the registers are shared and not reinitialized betewen VS and TCS, so we need to make sure to allocate the same registers for the system values between stages. Signed-off-by: Kristian H. Kristensen <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: Don't assume binning shader is always VSKristian H. Kristensen2019-11-071-2/+2
| | | | | | | | In tessellation mode, the TES is (probably) the binning shader. Signed-off-by: Kristian H. Kristensen <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: Setup inputs and outputs for tessellation stagesKristian H. Kristensen2019-11-071-7/+52
| | | | | | | | | | Similar to GS, some inputs are reused when the chsh from VS to TCS or TES to GS, so we need to make sure we setup the right inputs and make the shared system values outputs so they don't get clobbered. Signed-off-by: Kristian H. Kristensen <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: Implement TCS synchronization intrinsicsKristian H. Kristensen2019-11-071-0/+33
| | | | | | | | | We add two new IR3 specific nir intrinsics that map to the new condend and endpatch instructions. Signed-off-by: Kristian H. Kristensen <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: Implement tess coord intrinsicKristian H. Kristensen2019-11-071-0/+12
| | | | | | | | | | Our lowering pass made the z component unused by replacing its uses by 1 - x - y. The intrinsic implementation then just need to return the x and y components. Signed-off-by: Kristian H. Kristensen <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: End TES with chsh when using GSKristian H. Kristensen2019-11-071-1/+3
| | | | | | | | | When we have both TES and GS, the TES needs to chain to the VS with chmask and chsh GS just like the VS does to either TCS or GS. Signed-off-by: Kristian H. Kristensen <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: Add new synchronization opcodesKristian H. Kristensen2019-11-075-1/+15
| | | | | | | | | | | | | There are two new opcodes in use in tesselation control shaders: category 0, opcodes 13 and 15. unk13 is a kill type of instruction that terminates threads where !p0.x and it used to narrow down a patch wavefront to just thread 0. Then, once thread 0 has written the tess levels, it issues unk15, which might signal the TE that another patch has been fully written. Signed-off-by: Kristian H. Kristensen <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: Extend geometry lowering pass to handle tessellationKristian H. Kristensen2019-11-073-8/+520
| | | | | | | | | | | | | | | | VS and TCS pass varyings the same way as VS and GS does. TCS then writes entire patch to a system memory BO and TES eventually reads back from the BO once the TE starts generating vertices. TES outputs vertices the same way as VS and GS, except when there's a GS as well, in which case TES passes varyings to GS same way the VS would. In addition, the TCS needs a little bit of control flow massaging so that it only runs for valid invocations needs a couple of unknown instructions to synchronize with the TE. Signed-off-by: Kristian H. Kristensen <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: Add tessellation field to shader keyKristian H. Kristensen2019-11-072-1/+34
| | | | | | | | | | Whether we're tessellating and which primitives the TES outputs affects the entire pipeline so let's add a field to the key to track that. Signed-off-by: Kristian H. Kristensen <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: Use imul24 in offset calculationsKristian H. Kristensen2019-11-071-2/+2
| | | | | | | | | With the imul24 opcode in place, we can now use it for computing local offsets (ie for ldlw/stlw). Signed-off-by: Kristian H. Kristensen <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: Add ir3 intrinsics for tessellationKristian H. Kristensen2019-11-074-0/+26
| | | | | | | | | These provide the iovas for system memory buffers used for tessellation as well as a new HW specific system value. Signed-off-by: Kristian H. Kristensen <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: Add load and store intrinsics for global ioKristian H. Kristensen2019-11-071-0/+49
| | | | | | | | | These intrinsics take a ivec2 for the 64 bit base address and a integer offset. Signed-off-by: Kristian H. Kristensen <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/a6xx: Add register offset for STG/LDGKristian H. Kristensen2019-11-075-9/+64
| | | | | | | | | | These instructions take a 64 bit iova as two conescutive registers and a immediate offset. This patch adds support for the offset to be a single register, which is added to the 64 bit iova. Signed-off-by: Kristian H. Kristensen <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/a6x: Rename z/s formatsKristian H. Kristensen2019-11-073-10/+10
| | | | | | | | | | What we call eRB6_Z24_UNORM_S8_UINT now is actually RB6_Z24_UNORM_S8_UINT_AS_R8G8B8A8 and RB6_X8Z24_UNORM is actually RB6_Z24_UNORM_S8_UINT. Signed-off-by: Kristian H. Kristensen <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/a6xx: Fix layered texture type enumKristian H. Kristensen2019-11-071-3/+4
| | | | | | | | 2D array textures and 3D textures are different enum values after all. Signed-off-by: Kristian H. Kristensen <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/a6xx: Clear sysmem with CP_BLITKristian H. Kristensen2019-11-071-0/+4
| | | | | | Signed-off-by: Kristian H. Kristensen <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/registers: Add comments about primitive countersKristian H. Kristensen2019-11-071-12/+10
| | | | | | | | Adding comments about best guess at what the counters count. Signed-off-by: Kristian H. Kristensen <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/registers: Move SP_PRIMITIVE_CNTL and SP_VS_VPC_DSTKristian H. Kristensen2019-11-071-28/+28
| | | | | | | | Move these two to be in order with the other VS regs. Signed-off-by: Kristian H. Kristensen <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/registers: Fix typoKristian H. Kristensen2019-11-071-1/+1
| | | | | | Signed-off-by: Kristian H. Kristensen <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* meson: move the generic symbols check arguments to a common variableEric Engestrom2019-11-051-1/+1
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviwed-by: Dylan Baker <dylan@pnwbakers>
* meson: add variable to control the symbols checksEric Engestrom2019-11-051-1/+1
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviwed-by: Dylan Baker <dylan@pnwbakers>
* util: rename PIPE_ARCH_*_ENDIAN to UTIL_ARCH_*_ENDIANDylan Baker2019-11-051-1/+1
| | | | | | | | | | | As requested by Tim. This was generated with: grep 'PIPE_ARCH_.*_ENDIAN' -rIl | xargs sed -ie 's@PIPE_ARCH_\(.*\)_ENDIAN@UTIL_ARCH_\1_ENDIAN@'g v2: - add this patch Reviewed-by: Eric Engestrom <[email protected]>
* util/u_endian: set PIPE_ARCH_*_ENDIAN to 1Dylan Baker2019-11-051-1/+1
| | | | | | | | | | | | This will allow it to be used as a drop in replacement for _mesa_little_endian in a number of cases. v2: - Always define PIPE_ARCH_LITTLE_ENDIAN and PIPE_ARCH_BIG_ENDIAN, define the one that reflects the host system to 1 and the other to 0 - replace all uses of #ifdef, #ifndef, and #if defined() with #if and #if ! with PIPE_ARCH_*_ENDIAN Reviewed-by: Eric Engestrom <[email protected]>
* turnip: Remove _mesa_locale_init/fini calls.Bas Nieuwenhuizen2019-10-311-3/+0
| | | | | | | | | The resulting locale is not used for Vulkan, and it is not reference counted, giving issues when multiple instances are created. CC: 19.2 19.3 <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* freedreno/a2xx: add missing vertex formats (SSCALE/USCALE/FIXED)Jonathan Marek2019-10-301-1/+1
| | | | | | | | Mostly for vertex formats, but they are supported as texture formats too (untested however). Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* util: remove LIST_IS_EMPTY macroTimothy Arceri2019-10-281-2/+2
| | | | | | | Just use the inlined function directly. The new function was introduced in addcf410. Reviewed-by: Eric Engestrom <[email protected]>
* util: rename list_empty() to list_is_empty()Timothy Arceri2019-10-284-6/+6
| | | | | | | This makes it clear that it's a boolean test and not an action (eg. "empty the list"). Reviewed-by: Eric Engestrom <[email protected]>
* tu: fix empty-body instructionEric Engestrom2019-10-271-1/+1
| | | | | | | Fixes: 8d43e2b2ded0fe3c82d4 ("meson: add -Werror=empty-body to disallow `if(x);`") Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* freedreno/ir3: handle the progress caseRob Clark2019-10-241-26/+35
| | | | | | | | | | In some cases, in particular when you have things that can be src modifiers ((abs)/(neg)), once eliminating one mov, there is a possibility to remove another. Handle this by re-visiting an instruction after eliminating a copy on one of it's srcs. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: remove restrictions on const + (abs)/(neg)Rob Clark2019-10-242-14/+6
| | | | | | | | | These date back to relatively early days of ir3, when a lot was still not well understood. But according to CI (and what I've seen blob driver do), these are not actually real restrictions. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: allow copy-propagate out of fanoutRob Clark2019-10-241-7/+27
| | | | | | | | | Now that we fixed the sharp edges that this was papering over, we can relax the restriction about eliminating a mov coming out of a fanout (for example from result of texture fetch). Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: treat high vs low reg as conversionRob Clark2019-10-241-1/+7
| | | | | | | | This avoids copy-propagating a high register into an instruction which cannot consume it. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: propagate dest flags for collect/faninRob Clark2019-10-241-3/+9
| | | | | | | | | | | We did this properly already for split/fanout. But collect was missed. Extract out a helper to share. This way we avoid copy propagating a mov from high or half reg into an instruction which cannot consume a high/half reg. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: make high regs easier to see in IR dumpsRob Clark2019-10-241-0/+2
| | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: debug cleanupRob Clark2019-10-244-42/+29
| | | | | | | | | 1) deduplicate IR3_SHADER_DEBUG=disasm versus fs/vs/etc handling 2) standardize shader stage name prints, in particular VERT vs BVERT 3) don't mix stderr and stdout Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: fixup register footprint fixupRob Clark2019-10-221-1/+1
| | | | | | | | Small typo resulted in not converting footprint to vec4, meaning that we could potentially ask for quite a few more registers than required Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: handle scalarized varying inputsRob Clark2019-10-221-9/+12
| | | | | | | | | | | | | | | | | | | | | | If the load_interpolated_input is scalarized, we would be too conservative about deciding the tex instruction wasn't a candidate to pre-fetch: vec1 32 ssa_0 = load_const (0x00000000 /* 0.000000 */) vec2 32 ssa_1 = intrinsic load_barycentric_pixel () (0) /* interp_mode=0 */ vec1 32 ssa_2 = intrinsic load_interpolated_input (ssa_1, ssa_0) (0, 0) /* base=0 */ /* component=0 */ /* packed:v_uv,v_uv1 */ vec1 32 ssa_3 = intrinsic load_interpolated_input (ssa_1, ssa_0) (0, 1) /* base=0 */ /* component=1 */ /* packed:v_uv,v_uv1 */ vec2 32 ssa_8 = vec2 ssa_2, ssa_3 vec4 32 ssa_9 = tex ssa_8 (coord), 0 (texture), 0 (sampler) Really we don't care that the texcoord components come from different load_interpolated_input instructions, just that they have consecutive varying offsets. Reported-by: Eduardo Lima Mitev <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]> Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: Add missing ir3_nir_lower_tex_prefetch.c to Android.mkMarijn Suijten2019-10-211-0/+1
| | | | | | | | | This file is created in 2a0d45ae6cf09d60c048d7854e3d082bf15e374f but addition to android makefiles was omitted. It breaks the build with missing references which are defined in this file. List the file in ir3_SOURCES to make the build succeed. Signed-off-by: Marijn Suijten <[email protected]>
* nir/lower_idiv: add new llvm-based pathRhys Perry2019-10-211-1/+1
| | | | | | | | | | | | | | | | | v2: make variable names snake_case v2: minor cleanups in emit_udiv() v2: fix Panfrost build failure v3: use an enum instead of a boolean flag in nir_lower_idiv()'s signature v4: remove nir_op_urcp v5: drop nv50 path v5: rebase v6: add back nv50 path v6: add comment for nir_lower_idiv_path enum v7: rename _nv50/_llvm to _fast/_precise v8: fix etnaviv build failure Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>