summaryrefslogtreecommitdiffstats
path: root/src/broadcom/compiler
Commit message (Collapse)AuthorAgeFilesLines
* v3d: Fix setup of the VCM cache size.Eric Anholt2018-09-071-1/+2
| | | | | | | | | | | There were two bugs working together to make things mostly work: I wasn't dividing the VPM output size available by the size of a batch (vertex), but I also had the size of the VPM reduced by a factor of 8. Fixes dEQP-GLES3.functional.vertex_array_objects.all_attributes and it seems also my intermittent varying failures. Fixes: 1561e4984eb0 ("v3d: Emit the VCM_CACHE_SIZE packet.")
* v3d: Emit the VCM_CACHE_SIZE packet.Eric Anholt2018-08-062-1/+22
| | | | | | | This is needed to ensure that we don't get blocked waiting for VPM space with bin/render overlapping. Cc: "18.2" <[email protected]>
* v3d: Avoid spilling that breaks the r5 usage after a ldvary.Eric Anholt2018-08-061-0/+9
| | | | | | Fixes bad rendering when forcing 2 spills in glxgears. Cc: "18.2" <[email protected]>
* v3d: Make sure that QPU instruction-has-a-dest matches VIR.Eric Anholt2018-08-062-1/+11
| | | | | | | | | Found when debugging register spilling -- we would try to spill the dest of a STVPMV, inserting spill code after entering the last segment. In fact, we were likely to to choose to do this, given that the STVPMV "dest" temp was never read from, making it cheap to spill. Cc: "18.2" <[email protected]>
* v3d: Wait for TMU writes to complete before continuing after a spill.Eric Anholt2018-08-061-1/+6
| | | | | | | | The simulator complained that we had write responses outstanding at shader end. It seems that a TMU read does not guarantee that previous TMU writes by the thread have completed, which surprised me. Cc: "18.2" <[email protected]>
* v3d: Make sure we don't emit a thrsw before the last one finished.Eric Anholt2018-08-061-2/+13
| | | | | | | Found while forcing some spilling, which creates a lot of short tmua->thrsw->ldtmu sequences. Cc: "18.2" <[email protected]>
* v3d: Add some debug code for forcing register spilling.Eric Anholt2018-08-061-0/+14
| | | | | | This is useful for periodically testing out register spilling to see how it goes on simple shaders, rather than only failing on insanely complicated ones.
* v3d: Add support for the TMUWT instruction.Eric Anholt2018-07-313-3/+13
| | | | | | This instruction is used to ensure that TMU stores have been processed before moving on. In particular, you need any TMU ops to be done by the time the shader ends.
* vc4: Fix meson build when enabled without v3d.Eric Anholt2018-07-291-0/+2
| | | | | Reported-by: Rob Clark <[email protected]> Fixes: e92959c4e03c ("v3d: Pass the whole clif_dump structure to v3d_print_group().")
* nir: Add flipping of gl_PointCoord.y in nir_lower_wpos_ytransform.Eric Anholt2018-07-261-0/+1
| | | | | | | This is controlled by a new nir_shader_compiler_options flag, and fixes dEQP-GLES3.functional.shaders.builtin_variable.pointcoord on V3D. Reviewed-by: Kenneth Graunke <[email protected]>
* v3d: Implement a small immediates optimization, based on VC4's.Eric Anholt2018-07-237-19/+142
| | | | | | | | | We can do one per instruction, and we have to be careful not to overwrite raddr_b, but this greatly reduces the pressure on uniform loads (particularly around ldvpm/stvpm instructions). total instructions in shared programs: 90768 -> 88220 (-2.81%) instructions in affected programs: 82711 -> 80163 (-3.08%)
* v3d: Return an invalid src number if asked for a missing implicit uniform.Eric Anholt2018-07-232-3/+3
| | | | | | Sometimes when iterating over sources, we might want to check if it's the implicit one. We wouldn't want to match on a non-implicit src using this function.
* v3d: Skip emitting texture config parameter 2 if it's just the defaults.Eric Anholt2018-07-231-1/+5
| | | | | | shader-db: total instructions in shared programs: 91275 -> 90768 (-0.56%) instructions in affected programs: 20702 -> 20195 (-2.45%)
* v3d: Update an XXX comment for a path we handled in HW on V3D 4.x.Eric Anholt2018-07-231-1/+1
|
* v3d: Switch to using the new SFU instructions on V3D 4.x.Eric Anholt2018-07-236-24/+87
| | | | | | | | | | | | | | | | These instructions let us write directly to the phys regfile, instead of just R4. That lets us avoid moving out of R4 to avoid conflicting with other SFU results, and to avoid conflicting with thread switches. There is still an extra instruction of latency, which is not represented in the scheduler at the moment. If you use the result before it's ready, the QPU will just stall, unlike the magic R4 mode where you'd read the previous value. That means that the following shader-db results aren't quite representative (since we now cause some stalls instead of emitting nops), but they're impressive enough that I'm happy with the change. total instructions in shared programs: 95669 -> 91275 (-4.59%) instructions in affected programs: 82590 -> 78196 (-5.32%)
* v3d: Fix the name of the "flpop" operation.Eric Anholt2018-07-232-2/+2
| | | | | Noticed while trying to sort a new op into the appropriate place to match the documentation.
* v3d: Drop unused vir_SAT() operation.Eric Anholt2018-07-231-8/+0
| | | | We lower saturates in NIR.
* v3d: Rotate through registers to improve post-RA scheduling options.Eric Anholt2018-07-231-0/+45
| | | | | | | | | | | Similarly to VC4's implementation, by not picking r0 immediately upon freeing it, we give the scheduler more of a chance to fit later writes in earlier. I'm not clear on whether there's any real cost to picking phys over accumulators, so keep that behavior for now. shader-db: total instructions in shared programs: 96831 -> 95669 (-1.20%) instructions in affected programs: 77254 -> 76092 (-1.50%)
* v3d: Allow reading from physical regs written in the previous instruction.Eric Anholt2018-07-231-24/+0
| | | | | | | | | This restriction existed in V3D 2.x, but lifting it was a major change in 3.x. shader-db results: total instructions in shared programs: 98117 -> 96831 (-1.31%) instructions in affected programs: 48520 -> 47234 (-2.65%)
* v3d: Disable shader-db cycle estimates until we sort out TMU estimates.Eric Anholt2018-07-161-1/+4
| | | | | I keep having to ignore these shader-db changes since I don't trust them, so just disable the reports entirely.
* v3d: Emit the lowered uniform just before its first use in a block.Eric Anholt2018-07-161-20/+18
| | | | | | | | total instructions in shared programs: 98578 -> 98119 (-0.47%) instructions in affected programs: 27571 -> 27112 (-1.66%) and it also eliminates most spills/fills on the CTS's randomized uniform usage testcases.
* v3d: Add an assert that we don't provide an invalid texture return words.Eric Anholt2018-07-161-0/+8
| | | | The docs had an update noting this restriction, so reflect it in the code.
* v3d: Apply GFXH-1625 restriction on TMUWT in the end of the shader.Eric Anholt2018-07-161-0/+4
| | | | | This doesn't affect us yet since we're not doing TMUWTs, but I think we will for GLES 3.1.
* v3d: Implement noperspective varyings on V3D 4.x.Eric Anholt2018-07-093-3/+8
| | | | | Fixes a bunch of piglit interpolation tests, and reduces my concern about some MSAA blit shaders with noperspective varyings.
* v3d: Add support for GL_SAMPLE_ALPHA_TO_ONE.Eric Anholt2018-07-051-0/+3
| | | | Fixes piglit ext_framebuffer_multisample-draw-buffers-alpha-to-one
* v3d: Respect swap_color_rb for the f32_color_rb case.Eric Anholt2018-07-051-5/+7
| | | | | We don't actually set the two flags together, but I want to use the r/g/b/a reordered fields in the next commit.
* v3d: Implement ALPHA_TO_COVERAGE.Eric Anholt2018-06-202-2/+15
| | | | | | There's a convenient "FTOC" instruction for generating the coverage now, unlike vc4. This fixes dEQP-GLES3.functional.multisample.fbo_4_samples.proportionality_alpha_to_coverage
* v3d: Limit shader threading according to our maximum TMU fifo usage.Eric Anholt2018-06-151-10/+24
| | | | | | Fixes simulator assertion failures in dEQP-GLES3.functional.shaders.texture_functions.texture.samplercubeshadow_bias_fragment and similar complicated cases.
* v3d: Fix shaders using pixel center W but no varyings.Eric Anholt2018-06-153-15/+8
| | | | | | | | The docs called this field "uses both center W and centroid W", but actually it's "do you need center W even if varyings don't obviously call for it?" Fixes dEQP-GLES3.functional.shaders.builtin_variable.fragcoord_w
* v3d: Fix configuration setup of mixed f32 and f16 render targets.Eric Anholt2018-06-141-1/+1
| | | | Fixes dEQP-GLES3.functional.fragment_out.random.26 and 6 others.
* v3d: Remove unused QUNIFORM_STENCIL left over from vc4.Eric Anholt2018-06-141-2/+0
|
* v3d: Fix undefined results for a swap_color_rb RT from a float shader output.Eric Anholt2018-06-141-1/+4
| | | | | Fixes segfaults and undefined behavior in dEQP-GLES3.functional.fragment_out.basic.fixed.srgb8_alpha8_lowp_float
* v3d: Enable the new NIR bitfield operation lowering paths.Eric Anholt2018-06-061-2/+19
| | | | | | | | | | These together get the GLSL 3.00 unorm/snorm pack functions and MESA_shader_integer operations working. v2: Fix commit message typo. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* broadcom/vc5: Add support for centroid varyings.Eric Anholt2018-04-263-0/+44
| | | | | | | | | It would be nice to share the flags packet emit logic with flat shade flags, but I couldn't come up with a good way while still using our pack macros. We need to refactor this to shader record setup at compile time, anyway. Fixes ext_framebuffer_multisample-interpolation * centroid-*
* broadcom/vc5: Add validation that we don't violate GFXH-1633 requirements.Eric Anholt2018-04-261-0/+13
| | | | We don't use ldunifa yet, but we will eventually for UBOs.
* broadcom/vc5: Add validation that we don't violate GFXH-1625 requirements.Eric Anholt2018-04-261-0/+5
| | | | We don't use TMUWT yet, but we will once we do SSBOs.
* broadcom/vc5: Add QPU validation for register writes after thrend.Eric Anholt2018-04-261-3/+31
| | | | | | | The next shader gets to start writing the register file during these slots, so make sure we don't stomp over them. The only case of hitting this that I could imagine would be dead writes.
* broadcom/vc5: Remove leftover vc4 MSAA lowering setup in the FS key.Eric Anholt2018-04-251-12/+5
|
* util: Move util_is_power_of_two to bitscan.h and rename to ↵Ian Romanick2018-03-291-2/+2
| | | | | | | | | | | util_is_power_of_two_or_zero The new name make the zero-input behavior more obvious. The next patch adds a new function with different zero-input behavior. Signed-off-by: Ian Romanick <[email protected]> Suggested-by: Matt Turner <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]>
* broadcom/vc5: Start using nir_opt_move_load_ubo().Eric Anholt2018-03-281-0/+2
| | | | | | In the absence of a general NIR or VIR-level scheduler, this at least avoids spilling in GTF-GLES3.gtf.GL3Tests.uniform_buffer_object.uniform_buffer_object_storage_layouts
* broadcom/vc5: Fix extraneous register index in QIR dumping of TLBU writes.Eric Anholt2018-03-261-0/+1
| | | | Just like TLB without a config uniform, we don't have a register index.
* broadcom/vc5: Account for InstanceID/VertexID in VPM segment size.Eric Anholt2018-03-221-4/+9
| | | | | Fixes failure in GTF-GLES3.gtf.GL3Tests.draw_instanced.draw_instanced_attrib_size
* broadcom/vc5: Set up a vertex position if the shader doesn't.Eric Anholt2018-03-221-0/+22
| | | | | | Our backend needs some sort of vertex position value to emit the scaled viewport values and such. Fixes potential segfaults in KHR-GLES3.copy_tex_image_conversions.required.cubemap_negx_cubemap_negx
* broadcom/vc5: Fix up the NIR types of FS outputs generated by NIR-to-TGSI.Eric Anholt2018-03-212-0/+38
| | | | | | | | | | Unfortunately TGSI doesn't record the type of the FS output like GLSL does, but VC5's TLB writes depend on the output's base type. Just record the type in the key at variant compile time when we've got a TGSI input and then fix it up. Fixes KHR-GLES3.packed_pixels.pbo_rectangle.rgba32i/ui and apparently a GPU hang that breaks most tests that come after it.
* broadcom/vc5: Don't annotate dumps with stale live intervals.Eric Anholt2018-03-194-2/+8
| | | | | As you're debugging register allocation, you may have changed the intervals and not recomputed yet. Just skip the dump in that case.
* broadcom/vc5: Add support for register spilling.Eric Anholt2018-03-194-11/+276
| | | | | | | | | | | | | | | Our register spilling support is nice to have since vc4 couldn't at all, but we're still very restricted due to needing to not spill during a TMU operation, or during the last segment of the program (which would be nice to spill a value of, when there's a long-lived value being passed through with little modification from the start to the end). We could do better by emitting unspills for the last-segment values just before the last thrsw, since the last segment is probably not the maximum interference area. Fixes GTF uniform_buffer_object_arrays_of_all_valid_basic_types and 3 others.
* broadcom/vc5: Remove redundant last_inst lookup.Eric Anholt2018-03-191-1/+0
| | | | The point was to get the MOV, which the MOV_dest already returned.
* broadcom/vc5: On QPU pack error, dump the instruction and return cleanly.Eric Anholt2018-03-191-1/+7
| | | | This is nice for debugging when you've made a bad instruction.
* broadcom/vc5: Add cursors to the compiler infrastructure, like NIR's.Eric Anholt2018-03-193-8/+73
| | | | | This will let me do lowering late in compilation using the same instruction builder as we use in nir_to_vir.
* broadcom/vc5: Move the umul macro to a header.Eric Anholt2018-03-192-8/+8
| | | | Anywhere we want to multiply, we probably want this.