aboutsummaryrefslogtreecommitdiffstats
path: root/src/freedreno
Commit message (Collapse)AuthorAgeFilesLines
* freedreno/ir3: handle imad24_ir3 case in UBO loweringRob Clark2019-10-181-2/+27
| | | | | | | | | | | Similiar to iadd, we can fold an added constant value from an imad24_ir3 into the load_uniform's constant offset. This avoids some cases where the addition of imad24_ir3 could otherwise be a regression in instr count. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* freedreno/ir3: add imul24 opcodeRob Clark2019-10-181-0/+3
| | | | | | | | This maps to mul.s24 Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* freedreno/ir3: optimize immed 2nd src to madRob Clark2019-10-181-2/+11
| | | | | | | | | | | | | | | | | | | | We can't encode immed sources for cat3 (mad) instructions, but we can use const in first or third src. We handled this case already, but we weren't considering that we could lower immed to const. For manhattan: total instructions in shared programs: 35202 -> 34718 (-1.37%) instructions in affected programs: 14931 -> 14447 (-3.24%) helped: 90 HURT: 0 total full in shared programs: 2451 -> 2359 (-3.75%) full in affected programs: 653 -> 561 (-14.09%) helped: 69 HURT: 2 Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: add rule to generate imad24Rob Clark2019-10-181-0/+5
| | | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* nir: add nir_lower_amul passRob Clark2019-10-182-0/+4
| | | | | | | Lower amul to either imul or imul24, depending on whether 24b is enough bits to calculate an offset within the thing being dereferenced. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: Handle newly added opcode nir_op_imad24_ir3Eduardo Lima Mitev2019-10-181-0/+3
| | | | | | | | Simply emit an ir3_MAD_S24 instruction in the backend. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* freedreno/ir3: rename mul.s/mul.uRob Clark2019-10-185-12/+12
| | | | | | | | to mul.s24/mul.u24, to better reflect that these are 24b multiply. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* freedreno/ir3: enable pre-fs texture fetch for a6xxRob Clark2019-10-181-0/+6
| | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* turnip: add support for pre-fs texture fetchRob Clark2019-10-181-3/+21
| | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: Add support for texture sampling pre-dispatchHyunjun Ko2019-10-181-2/+73
| | | | | | Signed-off-by: Eduardo Lima Mitev <[email protected]> Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: Add a NIR pass to select tex instructions eligible for pre-fetchEduardo Lima Mitev2019-10-183-0/+199
| | | | | | | | | | | | | | | | | | | | | | | | | The pass should run once at the end of shader compilation, for a4xx onwards. It iterates texture sampling instructions and mark those eligibile for pre-dispatch by changing the tex op from 'tex' to 'tex_prefetch'. An instruction is eligibile if: * The coordinate is a vector where all its components come from a shader input. * The order of the components match exactly that of the input (no swizzles). * The instruction is in the 'main' function, and in the outer most-block. The first two restrictions were arrived to empirically, so more testing could tighten or loosen it. The 3rd restriction is there to allow moving the instructions eligible for pre-dispatch to the beginning of the shader, so that we don't block the registers holding the result for too long. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: force i/j pixel to r0.xRob Clark2019-10-181-0/+22
| | | | | | | | | It seems that pre-fs texture fetch only works if ij_pix ends up in r0.x. I've tried unknown zero bits, to no avail, and blob also seems to force r0.x when this feature is used. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: add pre-dispatch tex fetch to disasmRob Clark2019-10-181-0/+10
| | | | | | | | Useful to see in disassembly listing texture fetches that were moved to pre-dispatch. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: add dummy bary.f(ei) for pre-fs-fetchRob Clark2019-10-181-0/+19
| | | | | | | | | If the only use of varyings is a pre-shader texture-fetch, we still need to issue a bary.f with the end-input flag, otherwise we'll block further VS invocations, as the hw will think varying storage is still busy. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: fixup register footprint to account for prefetchRob Clark2019-10-181-0/+14
| | | | | | | | | | It is possible that the result of a pre-fs texture fetch is an output (or partially an output) of the FS. Sine the meta:tex_prefetch instructions are dropped before the assembler, we need to account for this when we fixup the register footprint. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: add meta instruction for pre-fs texture fetchRob Clark2019-10-186-3/+33
| | | | | | | | | | | | | | | | | Add a placeholder instruction to track texture fetches made prior to FS shader dispatch. These, like meta:input instructions are scheduled before any real instructions, so that RA realizes their result values are live before the first real instruction. And to give legalize a way to track usage of fetched sample requiring (sy) sync flags. There is some related special handling for varying texcoord inputs used for pre-fs-fetch, so that they are not DCE'd and remain in linkage between FS and previous stage. Note that we could almost avoid this special handling by giving meta:tex_prefetch real src arguments, except that in the FS stage, inputs are actual bary.f/ldlv instructions. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: don't DCE ij_pix if used for pre-fs-texture-fetchRob Clark2019-10-183-6/+14
| | | | | | | | | | When we enable pre-dispatch texture fetch, we could have a scenario where the barycentric i/j coord sysval is not used in the shader, but only used for the varying fetch for the pre-dispatch texture fetch. In this case we need to take care not to DCE this sysval. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: track sysval slot for inputsRob Clark2019-10-183-0/+12
| | | | | | | | Will be needed for special handling of SYSTEM_VALUE_BARYCENTRIC_PIXEL (ij_pix) when pre-fs texture fetch is enabled. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: remove unused ir3_instruction::inoutRob Clark2019-10-183-5/+0
| | | | | | | | Not sure I remember how long this has been unused for. But it's unused now. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: Add data structures to support texture pre-fetchHyunjun Ko2019-10-181-0/+37
| | | | | | Signed-off-by: Eduardo Lima Mitev <[email protected]> Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: update registersRob Clark2019-10-182-2/+23
| | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: Implement PIPE_QUERY_PRIMITIVES_GENERATED for GSKristian H. Kristensen2019-10-172-0/+34
| | | | | | | When we don't have streamout enabled, we have to read this register to get the number of primitives emitted. Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: End VS with CHMASK and CHSH in GS pipelinesKristian H. Kristensen2019-10-171-1/+18
| | | | | | | | | | | When used in a GS pipeline, the VS doesn't end with the END instruction. Instead it chains to the GS, which continues running with the same register allocation. The intended use cases seems to be that you can compile a regular VS (ie outputs in registers and ending with END) but then tack on link-time generated code past the END to write the outputs using STLW, in case the VS is used with GS. Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: Start GS with (ss) and (sy)Kristian H. Kristensen2019-10-171-0/+13
| | | | | | | We don't know what kind of loads we might have to wait on when coming in from chsh in the VS so set both sync flags. Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: Pre-color GS header and primitive IDKristian H. Kristensen2019-10-171-0/+9
| | | | | | | | | These sysvals have to be unclobbered by VS and in the same registers in both VS and GS, since the chsh from VS to GS doesn't reload the values. We use the pre-color argument to ir3_ra() to always place these values in r0.x and r0.y. Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: Setup ir3 inputs and outputs for GSKristian H. Kristensen2019-10-171-3/+64
| | | | | | | | | | | | Inputs are the GS header, which contains vertex ID, local primitive ID and thread ID as well as primitive ID. The setup is a little different from other sysvals, since we always have to receive them in the VS so that it can pass them on into the GS. The vertex flag outputs from GS is set up as a proper nir output in the lowering pass and doesn't need special handling here. Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: Implement primitive layout intrinsicsKristian H. Kristensen2019-10-173-0/+31
| | | | | | | | This implements the load_vs_primitive_stride_ir3, load_vs_vertex_stride_ir3 and load_primitive_location_ir3 intrinsics, used for getting the primitive layout strides and locations. Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: Implement lowering passes for VS and GSKristian H. Kristensen2019-10-178-2/+496
| | | | | | | | This introduces two new lowering passes. One to lower VS to explicit outputs using STLW and one to lower GS to load input using LDLW and implement the GS specific functionality. Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: Add has_gs flag to shader keyKristian H. Kristensen2019-10-171-0/+4
| | | | | | | Since the presence of GS changes how the VS operates we need to track that in the shader key. Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: Add intrinsics that map to LDLW/STLWKristian H. Kristensen2019-10-171-0/+75
| | | | | | | These intrinsics will let us do all the offset calculations in nir, which is nicer to work with and lets nir_opt_algebraic eat it all up. Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: Add new LDLW/STLW instructionsKristian H. Kristensen2019-10-174-3/+8
| | | | | | These access memory used for passing data between geometry stages. Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: Extend RA with mechanism for pre-coloring registersKristian H. Kristensen2019-10-173-50/+60
| | | | | | | We'll need to pre-color certain input registers betwee VS and GS shaders. Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: Use third register for offset for LDL and LDLVKristian H. Kristensen2019-10-174-12/+18
| | | | | | | | Before, offset held the offset, which can be either immediate or a register. Use a third register to hold the offset so that we can use a register. Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: Add support for CHSH and CHMASK instructionsKristian H. Kristensen2019-10-172-1/+3
| | | | | | | Just add the constructors for now and special case similar to END so we don't remove them. Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/registers: Update with GS, HS and DS registersKristian H. Kristensen2019-10-174-9/+105
| | | | Signed-off-by: Kristian H. Kristensen <[email protected]>
* nir: support feeding state to nir_lower_clip_[vg]sErik Faye-Lund2019-10-171-1/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* nir: support lowering clipdist to arraysErik Faye-Lund2019-10-171-2/+2
| | | | | | | | This allows us to make sure clipdist is emitted as a scalar array rather than two vec4s. This matches SPIR-V semantics, and will be useful for Zink. Reviewed-by: Marek Olšák <[email protected]>
* turnip: more descriptor setsJonathan Marek2019-10-156-50/+250
| | | | | Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* turnip: push constantsJonathan Marek2019-10-153-11/+50
| | | | | Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* turnip: depth/stencilJonathan Marek2019-10-153-20/+95
| | | | | Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* turnip: basic msaa workingJonathan Marek2019-10-154-20/+67
| | | | | | | Not perfect but gets through some tests. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* turnip: improve CmdCopyImage and implement CmdBlitImageJonathan Marek2019-10-158-590/+526
| | | | | Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* turnip: use nir_assign_io_var_locations instead of nir_assign_var_locationsJonathan Marek2019-10-151-6/+2
| | | | | | | Variables with same location should use the same driver_location. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* turnip: add missing nir passesJonathan Marek2019-10-151-5/+50
| | | | | | | Avoids assert fails in ir3. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* turnip: add code to lower indirect samplersJonathan Marek2019-10-151-14/+63
| | | | | | | | Taken from nir_lower_samplers. Sampler arrays don't work though, this is just to avoid an assert fail in ir3. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* turnip: fixup constsJonathan Marek2019-10-152-5/+6
| | | | | | | Fix some mistakes in previous series. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* turnip: update some shader state bits from GL driverJonathan Marek2019-10-151-68/+80
| | | | | | | Notably includes centroid varying bits that were missing. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* turnip: Emit clears of gmem using linear.Eric Anholt2019-10-151-1/+1
| | | | | | This is what we do in freedreno. Reviewed-by: Kristian H. Kristensen <[email protected]>
* turnip: Set up the correct tiling mode for small attachments.Eric Anholt2019-10-153-3/+17
| | | | | | | Noticed while debugging a tiling-looking issue by comparing our gmem blit setup to freedreno's. Reviewed-by: Kristian H. Kristensen <[email protected]>
* turnip: Tell spirv_to_nir that we want fragcoord as a sysval.Eric Anholt2019-10-151-0/+1
| | | | | | | | Fixes ir3 compiler failure failure in dEQP-VK.renderpass.dedicated_allocation.formats.r8g8b8a8_unorm.clear.clear_draw (now just a rendering failure where the subpass clear isn't happening) Reviewed-by: Kristian H. Kristensen <[email protected]>