aboutsummaryrefslogtreecommitdiffstats
path: root/src/broadcom/compiler
Commit message (Collapse)AuthorAgeFilesLines
* broadcom/vc5: Add support for centroid varyings.Eric Anholt2018-04-263-0/+44
| | | | | | | | | It would be nice to share the flags packet emit logic with flat shade flags, but I couldn't come up with a good way while still using our pack macros. We need to refactor this to shader record setup at compile time, anyway. Fixes ext_framebuffer_multisample-interpolation * centroid-*
* broadcom/vc5: Add validation that we don't violate GFXH-1633 requirements.Eric Anholt2018-04-261-0/+13
| | | | We don't use ldunifa yet, but we will eventually for UBOs.
* broadcom/vc5: Add validation that we don't violate GFXH-1625 requirements.Eric Anholt2018-04-261-0/+5
| | | | We don't use TMUWT yet, but we will once we do SSBOs.
* broadcom/vc5: Add QPU validation for register writes after thrend.Eric Anholt2018-04-261-3/+31
| | | | | | | The next shader gets to start writing the register file during these slots, so make sure we don't stomp over them. The only case of hitting this that I could imagine would be dead writes.
* broadcom/vc5: Remove leftover vc4 MSAA lowering setup in the FS key.Eric Anholt2018-04-251-12/+5
|
* util: Move util_is_power_of_two to bitscan.h and rename to ↵Ian Romanick2018-03-291-2/+2
| | | | | | | | | | | util_is_power_of_two_or_zero The new name make the zero-input behavior more obvious. The next patch adds a new function with different zero-input behavior. Signed-off-by: Ian Romanick <[email protected]> Suggested-by: Matt Turner <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]>
* broadcom/vc5: Start using nir_opt_move_load_ubo().Eric Anholt2018-03-281-0/+2
| | | | | | In the absence of a general NIR or VIR-level scheduler, this at least avoids spilling in GTF-GLES3.gtf.GL3Tests.uniform_buffer_object.uniform_buffer_object_storage_layouts
* broadcom/vc5: Fix extraneous register index in QIR dumping of TLBU writes.Eric Anholt2018-03-261-0/+1
| | | | Just like TLB without a config uniform, we don't have a register index.
* broadcom/vc5: Account for InstanceID/VertexID in VPM segment size.Eric Anholt2018-03-221-4/+9
| | | | | Fixes failure in GTF-GLES3.gtf.GL3Tests.draw_instanced.draw_instanced_attrib_size
* broadcom/vc5: Set up a vertex position if the shader doesn't.Eric Anholt2018-03-221-0/+22
| | | | | | Our backend needs some sort of vertex position value to emit the scaled viewport values and such. Fixes potential segfaults in KHR-GLES3.copy_tex_image_conversions.required.cubemap_negx_cubemap_negx
* broadcom/vc5: Fix up the NIR types of FS outputs generated by NIR-to-TGSI.Eric Anholt2018-03-212-0/+38
| | | | | | | | | | Unfortunately TGSI doesn't record the type of the FS output like GLSL does, but VC5's TLB writes depend on the output's base type. Just record the type in the key at variant compile time when we've got a TGSI input and then fix it up. Fixes KHR-GLES3.packed_pixels.pbo_rectangle.rgba32i/ui and apparently a GPU hang that breaks most tests that come after it.
* broadcom/vc5: Don't annotate dumps with stale live intervals.Eric Anholt2018-03-194-2/+8
| | | | | As you're debugging register allocation, you may have changed the intervals and not recomputed yet. Just skip the dump in that case.
* broadcom/vc5: Add support for register spilling.Eric Anholt2018-03-194-11/+276
| | | | | | | | | | | | | | | Our register spilling support is nice to have since vc4 couldn't at all, but we're still very restricted due to needing to not spill during a TMU operation, or during the last segment of the program (which would be nice to spill a value of, when there's a long-lived value being passed through with little modification from the start to the end). We could do better by emitting unspills for the last-segment values just before the last thrsw, since the last segment is probably not the maximum interference area. Fixes GTF uniform_buffer_object_arrays_of_all_valid_basic_types and 3 others.
* broadcom/vc5: Remove redundant last_inst lookup.Eric Anholt2018-03-191-1/+0
| | | | The point was to get the MOV, which the MOV_dest already returned.
* broadcom/vc5: On QPU pack error, dump the instruction and return cleanly.Eric Anholt2018-03-191-1/+7
| | | | This is nice for debugging when you've made a bad instruction.
* broadcom/vc5: Add cursors to the compiler infrastructure, like NIR's.Eric Anholt2018-03-193-8/+73
| | | | | This will let me do lowering late in compilation using the same instruction builder as we use in nir_to_vir.
* broadcom/vc5: Move the umul macro to a header.Eric Anholt2018-03-192-8/+8
| | | | Anywhere we want to multiply, we probably want this.
* broadcom/vc5: Correct the arg count of TIDX/EIDX.Eric Anholt2018-03-191-2/+2
|
* broadcom/vc5: Re-do live variables after removing thrsws.Eric Anholt2018-03-192-3/+14
| | | | Otherwise our start/ends ips won't line up with the actual instructions.
* broadcom/vc5: Extract v3d_qpu_writes_tmu() helper.Eric Anholt2018-03-191-6/+1
| | | | This will be reused in register spilling.
* nir: add lower_ldexp to nir compiler optionsTimothy Arceri2018-02-281-0/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* broadcom/vc5: Try to merge more than 2 QPU instructions together.Eric Anholt2018-02-051-5/+13
| | | | | | | | Obviously it would be good to have an ADD and a MUL and a signal together, but we can even potentially have multiple signals merged, as well. total instructions in shared programs: 100423 -> 97874 (-2.54%) instructions in affected programs: 78812 -> 76263 (-3.23%)
* broadcom/vc5: Remove no-op MOVs after register allocation.Eric Anholt2018-02-051-1/+60
| | | | | | | | We emit some MOVs to track lifetimes of payload registers, but we don't need there to be actual MOV instructions for them. total instructions in shared programs: 101045 -> 100423 (-0.62%) instructions in affected programs: 37083 -> 36461 (-1.68%)
* broadcom/vc5: Add missing shader-db instruction counting.Eric Anholt2018-02-051-0/+7
| | | | I must have misplaced it in the instruction packing rework.
* broadcom/vc5: Fix a segfault on mix of booleans.Eric Anholt2018-02-011-1/+3
| | | | We don't have a src1 to look up if the compare instruction is "i2b".
* nir: add lower_all_io_to_temps flagTimothy Arceri2018-01-311-0/+1
| | | | | | | This will be used for freedreno and vc4 which require all inputs and outputs to be copied to temps. Reviewed-by: Marek Olšák <[email protected]>
* broadcom/vc5: Update the compiler for V3D 4.2.Eric Anholt2018-01-271-2/+6
|
* broadcom/vc5: Use MSF to ignore discards/non-dispatched channels in loops.Eric Anholt2018-01-121-1/+5
| | | | | Prevents potential infinite loops when a non-dispatched or discarded channel never triggers the loop break condition.
* broadcom/vc5: Use XOR instead of SUB for execute flags comparisons.Eric Anholt2018-01-121-3/+3
| | | | | I think this should be equivalent other than power, and it's the kind of comparison we use for nir_op_ieq.
* broadcom/vc5: Also check the update flags for avoiding DCE.Eric Anholt2018-01-121-1/+5
| | | | I was trying to do a NULL-destination UF, and it got removed.
* broadcom/vc5: Add support for loading varyings in V3D 4.1.Eric Anholt2018-01-126-17/+13
| | | | | | | The LDVARY signal now writes an arbitrary register, so I took out the magic src register file and replaced it with an instruction with LDVARY set so we have somewhere to hang a QFILE_TEMP destination for register allocation.
* broadcom/vc5: Add compiler support for V3D 4.x texturing.Eric Anholt2018-01-126-6/+282
|
* broadcom/vc5: Move V3D 3.3 texturing to a separate file.Eric Anholt2018-01-124-229/+266
| | | | | V3D 4.x texturing changes enough that #ifdefs would just make a mess of it.
* broadcom/vc5: Move V3D 3.3 VPM write setup to a separate file.Eric Anholt2018-01-124-34/+81
| | | | | For V4.1 texturing, I need the V4.1 XML, so the main compiler needs to stop including V3.3 XML.
* broadcom/vc5: Use THRSW to enable multi-threaded shaders.Eric Anholt2018-01-127-76/+279
| | | | | This is a major performance boost on all of V3D, but is required on V3D 4.x where shaders are always either 2- or 4-threaded.
* broadcom/vc5: Properly schedule the thread-end THRSW.Eric Anholt2018-01-122-39/+137
| | | | | | | | | | This fills in the delay slots of thread end as much as we can (other than being cautious about potential TLBZ writes). In the process, I moved the thread end THRSW instruction creation to the scheduler. Once we start emitting THRSWs in the shader, we need to schedule the thread-end one differently from other THRSWs, so having it in there makes that easy.
* broadcom/vc5: Implement GFXH-1684 workaround.Eric Anholt2018-01-124-0/+20
| | | | Apparently the VPM writes need to be flushed out before we end the shader.
* broadcom/vc5: Use a physical-reg-only register class for LDVPM.Eric Anholt2018-01-122-8/+21
| | | | | This is needed for LDVPM on V3D 4.x, but will also be needed for keeping values out of the accumulators across THRSW.
* broadcom/vc5: Use the new LDVPM/STVPM opcodes on V3D 4.1.Eric Anholt2018-01-125-39/+87
| | | | | | | | | | | Now, instead of a magic write register for VPM stores we have an instruction to do them (which means no packing of other ALU ops into it), with the ability to reorder the VPM stores due to the offset being baked into the instruction. VPM loads also gain the ability to be reordered by packing the row into the A argument. They also no longer write to the r3 accumulator, and instead must be stored to a physical register.
* broadcom/vc5: Add support for V3Dv4 signal bits.Eric Anholt2018-01-127-21/+102
| | | | | | | The WRTMUC replaces the implicit uniform loads in the first two texture instructions. LDVPM disappears in favor of an ALU op. LDVARY, LDTMU, LDTLB, and LDUNIF*RF now write to arbitrary registers, which required passing the devinfo through to a few more functions.
* meson: Use dependencies for nirDylan Baker2018-01-111-2/+2
| | | | | | | | | | | | | | | | | This creates two new internal dependencies, idep_nir_headers and idep_nir. The former encapsulates the generation of nir_opcodes.h and nir_builder_opcodes.h and adding src/compiler/nir as an include path. This ensures that any target that needs nir headers will have the includes and that the generated headers will be generated before the target is build. The second, idep_nir, includes the first and additionally links to libnir. This is intended to make it easier to avoid race conditions in the build when using nir, since the number of consumers for libnir and it's headers are quite high. Acked-by: Eric Engestrom <[email protected]> Signed-off-by: Dylan Baker <[email protected]>
* broadcom/vc5: Fix discard_if during control flow.Eric Anholt2018-01-031-1/+1
| | | | | | | I want to do the SETMSF.IFA to discard only if execute == 0 and cond, so our dest of the PUSHZ needs to be nonzero if execute or !cond are nonzero. Fixes dEQP-GLES3.functional.shaders.discard.dynamic_loop_dynamic.
* broadcom/vc5: Don't emit component 3/4 F16 TLB writes for float/vec2.Eric Anholt2018-01-031-1/+2
| | | | | Fixes a simulator assertion failure on dEQP-GLES3.functional.fragment_out.array.fixed.r8_highp_float.
* broadcom/vc5: Emit flat shade flags for varying components > 24.Eric Anholt2018-01-032-5/+12
| | | | | | | | | | This means that with no flatshading we'll emit the single-byte ZERO_ALL_FLAT_SHADE_FLAGS, and otherwise emit a set of FLAT_SHADE_FLAGS to get all the bits we need set. There's a _SET enum in the packet we could use to possibly set entire ranges of the bitfield without using another packet, but this at least fixes the conformance failure.
* broadcom/vc5: Emit proper flatshading code for glShadeModel(GL_FLAT).Eric Anholt2018-01-033-20/+14
| | | | | | | | In updating the simulator, behavior changed slightly so that our old code wasn't getting glxgears's flatshading interpolated right. Emit flat shading code just like we would for a normal flat-shaded varying, by passing a flag in the shader key for glShadeModel(GL_FLAT) state and customizing the color inputs based on that.
* broadcom/vc5: Move texture return channel setup into the compiler.Eric Anholt2018-01-032-14/+40
| | | | | | | | The compiler decides how many LDTMUs we're going to emit, and that must match the P1 flags. This brings the return channel counting to a single place (so all that's passed into the compiler is "how many return channels you may request from this texture's format), and was a necessary step for shadow samplers once we stop using OVRTMUOUT=0.
* broadcom/vc5: Enable NIR txd lowering on all txd instructions.Eric Anholt2017-12-141-0/+1
| | | | | | | | Fixes almost all of piglit's arb_shader_texture_lod grad tests, except for the base -texgrad/texgradcube ones which fail on what appear to be precision problems. Reviewed-by: Ian Romanick <[email protected]>
* broadcom/vc5: Fix shader input/outputs for gallium's new NIR linking.Eric Anholt2017-12-141-4/+8
|
* broadcom/vc5: Fix BASE_LEVEL handling with txl.Eric Anholt2017-11-221-2/+4
| | | | | | | The HW doesn't add the base level anywhere (the min/max lod clamping is what does base level), so we need to add it manually in this case. Fixes piglit tex-miplevel-selection *Lod 2D.
* broadcom/vc5: Ensure that there is always a TLB write.Eric Anholt2017-11-171-1/+17
| | | | | This should fix some GPU hangs in our (currently always single-threaded) fragment shaders, and definitely fixes assertion failures in simulation.