mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	v3d: Use nir_remove_unused_io_vars to handle binner shader output DCE	Eric Anholt	2018-10-30	2	-45/+13
\| \| \| \| \| \|	We were doing this late after nir_lower_io, but we can just reuse the core code. By doing it at this stage, we won't even set up the VS attributes as inputs, reducing our VPM size.
*	v3d: Only add output slot tracking for the current varying slot.	Eric Anholt	2018-10-30	1	-1/+1
\| \| \| \| \| \| \| \|	We always emit 4 slots per slot because things like color output and position processing in the epilogue will potentially look up more values than the variable declaration had. However, when we get a .location_frac != 0, we don't want to overwrite components of the following .driver_location.
*	v3d: Use nir_lower_io_to_scalar_early to DCE unused VS input components.	Eric Anholt	2018-10-30	1	-0/+16
\| \| \| \| \|	This lets us trim unused trailing components in the vertex attributes, reducing the size of our VPM allocations.
*	v3d: Don't rely on sorting input vars for VPM read setup.	Eric Anholt	2018-10-30	1	-28/+20
\| \| \| \| \| \| \|	For supporting scalar VPM i/o at the NIR level, we need to do a pass over the vars to figure out how big each attribute is after DCE. Once we've done that, we can just walk over c->vattr_sizes[] instead of bothering with vars.
*	v3d: Split out NIR input setup between FS and VPM.	Eric Anholt	2018-10-30	1	-47/+80
\| \| \| \| \|	They don't share much code, and I'm about to rewrite the remaining shared code for the VPM case.
*	util: use C99 declaration in the for-loop hash_table_foreach() macro	Eric Engestrom	2018-10-25	2	-3/+0
\| \| \| \| \|	Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
*	v3d: Add support for hardware pack/unpack of half floats.	Eric Anholt	2018-10-15	1	-0/+16
\| \| \| \| \|	Cuts the formerly 7-minute simulation time of fs-packHalf2x16.shader_test in half.
*	v3d: Fix setup of the VCM cache size.	Eric Anholt	2018-09-07	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \|	There were two bugs working together to make things mostly work: I wasn't dividing the VPM output size available by the size of a batch (vertex), but I also had the size of the VPM reduced by a factor of 8. Fixes dEQP-GLES3.functional.vertex_array_objects.all_attributes and it seems also my intermittent varying failures. Fixes: 1561e4984eb0 ("v3d: Emit the VCM_CACHE_SIZE packet.")
*	v3d: Emit the VCM_CACHE_SIZE packet.	Eric Anholt	2018-08-06	2	-1/+22
\| \| \| \| \| \| \|	This is needed to ensure that we don't get blocked waiting for VPM space with bin/render overlapping. Cc: "18.2" <[email protected]>
*	v3d: Avoid spilling that breaks the r5 usage after a ldvary.	Eric Anholt	2018-08-06	1	-0/+9
\| \| \| \| \| \|	Fixes bad rendering when forcing 2 spills in glxgears. Cc: "18.2" <[email protected]>
*	v3d: Make sure that QPU instruction-has-a-dest matches VIR.	Eric Anholt	2018-08-06	2	-1/+11
\| \| \| \| \| \| \| \| \|	Found when debugging register spilling -- we would try to spill the dest of a STVPMV, inserting spill code after entering the last segment. In fact, we were likely to to choose to do this, given that the STVPMV "dest" temp was never read from, making it cheap to spill. Cc: "18.2" <[email protected]>
*	v3d: Wait for TMU writes to complete before continuing after a spill.	Eric Anholt	2018-08-06	1	-1/+6
\| \| \| \| \| \| \| \|	The simulator complained that we had write responses outstanding at shader end. It seems that a TMU read does not guarantee that previous TMU writes by the thread have completed, which surprised me. Cc: "18.2" <[email protected]>
*	v3d: Make sure we don't emit a thrsw before the last one finished.	Eric Anholt	2018-08-06	1	-2/+13
\| \| \| \| \| \| \|	Found while forcing some spilling, which creates a lot of short tmua->thrsw->ldtmu sequences. Cc: "18.2" <[email protected]>
*	v3d: Add some debug code for forcing register spilling.	Eric Anholt	2018-08-06	1	-0/+14
\| \| \| \| \| \|	This is useful for periodically testing out register spilling to see how it goes on simple shaders, rather than only failing on insanely complicated ones.
*	v3d: Add support for the TMUWT instruction.	Eric Anholt	2018-07-31	3	-3/+13
\| \| \| \| \| \|	This instruction is used to ensure that TMU stores have been processed before moving on. In particular, you need any TMU ops to be done by the time the shader ends.
*	vc4: Fix meson build when enabled without v3d.	Eric Anholt	2018-07-29	1	-0/+2
\| \| \| \| \|	Reported-by: Rob Clark <[email protected]> Fixes: e92959c4e03c ("v3d: Pass the whole clif_dump structure to v3d_print_group().")
*	nir: Add flipping of gl_PointCoord.y in nir_lower_wpos_ytransform.	Eric Anholt	2018-07-26	1	-0/+1
\| \| \| \| \| \| \|	This is controlled by a new nir_shader_compiler_options flag, and fixes dEQP-GLES3.functional.shaders.builtin_variable.pointcoord on V3D. Reviewed-by: Kenneth Graunke <[email protected]>
*	v3d: Implement a small immediates optimization, based on VC4's.	Eric Anholt	2018-07-23	7	-19/+142
\| \| \| \| \| \| \| \| \|	We can do one per instruction, and we have to be careful not to overwrite raddr_b, but this greatly reduces the pressure on uniform loads (particularly around ldvpm/stvpm instructions). total instructions in shared programs: 90768 -> 88220 (-2.81%) instructions in affected programs: 82711 -> 80163 (-3.08%)
*	v3d: Return an invalid src number if asked for a missing implicit uniform.	Eric Anholt	2018-07-23	2	-3/+3
\| \| \| \| \| \|	Sometimes when iterating over sources, we might want to check if it's the implicit one. We wouldn't want to match on a non-implicit src using this function.
*	v3d: Skip emitting texture config parameter 2 if it's just the defaults.	Eric Anholt	2018-07-23	1	-1/+5
\| \| \| \| \| \|	shader-db: total instructions in shared programs: 91275 -> 90768 (-0.56%) instructions in affected programs: 20702 -> 20195 (-2.45%)
*	v3d: Update an XXX comment for a path we handled in HW on V3D 4.x.	Eric Anholt	2018-07-23	1	-1/+1
\|
*	v3d: Switch to using the new SFU instructions on V3D 4.x.	Eric Anholt	2018-07-23	6	-24/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These instructions let us write directly to the phys regfile, instead of just R4. That lets us avoid moving out of R4 to avoid conflicting with other SFU results, and to avoid conflicting with thread switches. There is still an extra instruction of latency, which is not represented in the scheduler at the moment. If you use the result before it's ready, the QPU will just stall, unlike the magic R4 mode where you'd read the previous value. That means that the following shader-db results aren't quite representative (since we now cause some stalls instead of emitting nops), but they're impressive enough that I'm happy with the change. total instructions in shared programs: 95669 -> 91275 (-4.59%) instructions in affected programs: 82590 -> 78196 (-5.32%)
*	v3d: Fix the name of the "flpop" operation.	Eric Anholt	2018-07-23	2	-2/+2
\| \| \| \| \|	Noticed while trying to sort a new op into the appropriate place to match the documentation.
*	v3d: Drop unused vir_SAT() operation.	Eric Anholt	2018-07-23	1	-8/+0
\| \| \| \|	We lower saturates in NIR.
*	v3d: Rotate through registers to improve post-RA scheduling options.	Eric Anholt	2018-07-23	1	-0/+45
\| \| \| \| \| \| \| \| \| \| \|	Similarly to VC4's implementation, by not picking r0 immediately upon freeing it, we give the scheduler more of a chance to fit later writes in earlier. I'm not clear on whether there's any real cost to picking phys over accumulators, so keep that behavior for now. shader-db: total instructions in shared programs: 96831 -> 95669 (-1.20%) instructions in affected programs: 77254 -> 76092 (-1.50%)
*	v3d: Allow reading from physical regs written in the previous instruction.	Eric Anholt	2018-07-23	1	-24/+0
\| \| \| \| \| \| \| \| \|	This restriction existed in V3D 2.x, but lifting it was a major change in 3.x. shader-db results: total instructions in shared programs: 98117 -> 96831 (-1.31%) instructions in affected programs: 48520 -> 47234 (-2.65%)
*	v3d: Disable shader-db cycle estimates until we sort out TMU estimates.	Eric Anholt	2018-07-16	1	-1/+4
\| \| \| \| \|	I keep having to ignore these shader-db changes since I don't trust them, so just disable the reports entirely.
*	v3d: Emit the lowered uniform just before its first use in a block.	Eric Anholt	2018-07-16	1	-20/+18
\| \| \| \| \| \| \| \|	total instructions in shared programs: 98578 -> 98119 (-0.47%) instructions in affected programs: 27571 -> 27112 (-1.66%) and it also eliminates most spills/fills on the CTS's randomized uniform usage testcases.
*	v3d: Add an assert that we don't provide an invalid texture return words.	Eric Anholt	2018-07-16	1	-0/+8
\| \| \| \|	The docs had an update noting this restriction, so reflect it in the code.
*	v3d: Apply GFXH-1625 restriction on TMUWT in the end of the shader.	Eric Anholt	2018-07-16	1	-0/+4
\| \| \| \| \|	This doesn't affect us yet since we're not doing TMUWTs, but I think we will for GLES 3.1.
*	v3d: Implement noperspective varyings on V3D 4.x.	Eric Anholt	2018-07-09	3	-3/+8
\| \| \| \| \|	Fixes a bunch of piglit interpolation tests, and reduces my concern about some MSAA blit shaders with noperspective varyings.
*	v3d: Add support for GL_SAMPLE_ALPHA_TO_ONE.	Eric Anholt	2018-07-05	1	-0/+3
\| \| \| \|	Fixes piglit ext_framebuffer_multisample-draw-buffers-alpha-to-one
*	v3d: Respect swap_color_rb for the f32_color_rb case.	Eric Anholt	2018-07-05	1	-5/+7
\| \| \| \| \|	We don't actually set the two flags together, but I want to use the r/g/b/a reordered fields in the next commit.
*	v3d: Implement ALPHA_TO_COVERAGE.	Eric Anholt	2018-06-20	2	-2/+15
\| \| \| \| \| \|	There's a convenient "FTOC" instruction for generating the coverage now, unlike vc4. This fixes dEQP-GLES3.functional.multisample.fbo_4_samples.proportionality_alpha_to_coverage
*	v3d: Limit shader threading according to our maximum TMU fifo usage.	Eric Anholt	2018-06-15	1	-10/+24
\| \| \| \| \| \|	Fixes simulator assertion failures in dEQP-GLES3.functional.shaders.texture_functions.texture.samplercubeshadow_bias_fragment and similar complicated cases.
*	v3d: Fix shaders using pixel center W but no varyings.	Eric Anholt	2018-06-15	3	-15/+8
\| \| \| \| \| \| \| \|	The docs called this field "uses both center W and centroid W", but actually it's "do you need center W even if varyings don't obviously call for it?" Fixes dEQP-GLES3.functional.shaders.builtin_variable.fragcoord_w
*	v3d: Fix configuration setup of mixed f32 and f16 render targets.	Eric Anholt	2018-06-14	1	-1/+1
\| \| \| \|	Fixes dEQP-GLES3.functional.fragment_out.random.26 and 6 others.
*	v3d: Remove unused QUNIFORM_STENCIL left over from vc4.	Eric Anholt	2018-06-14	1	-2/+0
\|
*	v3d: Fix undefined results for a swap_color_rb RT from a float shader output.	Eric Anholt	2018-06-14	1	-1/+4
\| \| \| \| \|	Fixes segfaults and undefined behavior in dEQP-GLES3.functional.fragment_out.basic.fixed.srgb8_alpha8_lowp_float
*	v3d: Enable the new NIR bitfield operation lowering paths.	Eric Anholt	2018-06-06	1	-2/+19
\| \| \| \| \| \| \| \| \| \|	These together get the GLSL 3.00 unorm/snorm pack functions and MESA_shader_integer operations working. v2: Fix commit message typo. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	broadcom/vc5: Add support for centroid varyings.	Eric Anholt	2018-04-26	3	-0/+44
\| \| \| \| \| \| \| \| \|	It would be nice to share the flags packet emit logic with flat shade flags, but I couldn't come up with a good way while still using our pack macros. We need to refactor this to shader record setup at compile time, anyway. Fixes ext_framebuffer_multisample-interpolation * centroid-*
*	broadcom/vc5: Add validation that we don't violate GFXH-1633 requirements.	Eric Anholt	2018-04-26	1	-0/+13
\| \| \| \|	We don't use ldunifa yet, but we will eventually for UBOs.
*	broadcom/vc5: Add validation that we don't violate GFXH-1625 requirements.	Eric Anholt	2018-04-26	1	-0/+5
\| \| \| \|	We don't use TMUWT yet, but we will once we do SSBOs.
*	broadcom/vc5: Add QPU validation for register writes after thrend.	Eric Anholt	2018-04-26	1	-3/+31
\| \| \| \| \| \| \|	The next shader gets to start writing the register file during these slots, so make sure we don't stomp over them. The only case of hitting this that I could imagine would be dead writes.
*	broadcom/vc5: Remove leftover vc4 MSAA lowering setup in the FS key.	Eric Anholt	2018-04-25	1	-12/+5
\|
*	util: Move util_is_power_of_two to bitscan.h and rename to ↵	Ian Romanick	2018-03-29	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	util_is_power_of_two_or_zero The new name make the zero-input behavior more obvious. The next patch adds a new function with different zero-input behavior. Signed-off-by: Ian Romanick <[email protected]> Suggested-by: Matt Turner <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]>
*	broadcom/vc5: Start using nir_opt_move_load_ubo().	Eric Anholt	2018-03-28	1	-0/+2
\| \| \| \| \| \|	In the absence of a general NIR or VIR-level scheduler, this at least avoids spilling in GTF-GLES3.gtf.GL3Tests.uniform_buffer_object.uniform_buffer_object_storage_layouts
*	broadcom/vc5: Fix extraneous register index in QIR dumping of TLBU writes.	Eric Anholt	2018-03-26	1	-0/+1
\| \| \| \|	Just like TLB without a config uniform, we don't have a register index.
*	broadcom/vc5: Account for InstanceID/VertexID in VPM segment size.	Eric Anholt	2018-03-22	1	-4/+9
\| \| \| \| \|	Fixes failure in GTF-GLES3.gtf.GL3Tests.draw_instanced.draw_instanced_attrib_size
*	broadcom/vc5: Set up a vertex position if the shader doesn't.	Eric Anholt	2018-03-22	1	-0/+22
\| \| \| \| \| \|	Our backend needs some sort of vertex position value to emit the scaled viewport values and such. Fixes potential segfaults in KHR-GLES3.copy_tex_image_conversions.required.cubemap_negx_cubemap_negx