mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	vc4: Try to schedule QIR instructions between writing to and reading math.	Eric Anholt	2016-11-30	1	-0/+22
\| \| \| \| \| \| \| \| \|	This helps us get the delay slots between SFU writes and reads filled. total instructions in shared programs: 94494 -> 93970 (-0.55%) instructions in affected programs: 59206 -> 58682 (-0.89%) 3DMMES performance +1.89967% +/- 0.157611% (n=10,9)
*	vc4: Improve interleaving of texture coordinates vs results.	Eric Anholt	2016-11-30	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The latency_between was trying to handle the delay between the coordinate write ("before") and the corresponding sample read ("after"), but we were handing in the two instructions swapped. This meant that we tried to fit things between a tex_s and its preceding tex_result. This made us only interleave normal texture coordinates by accident, and pessimized UBO reads by pushing the tex_result collection earlier until there was nothing but it (and then its preceding coordinate setup) left. In addition to latency reduction, things end up packing better (probably due to reduced live ranges of the texture results): total instructions in shared programs: 98121 -> 94775 (-3.41%) instructions in affected programs: 91196 -> 87850 (-3.67%) 3DMMES performance +1.15569% +/- 0.124714% (n=8,10)
*	vc4: Fix stray "." on no-op MUL packs.	Eric Anholt	2016-11-30	1	-6/+6
\| \| \| \| \|	This happened when the PM bit was set for R4 unpacks, where the MUL pack was NOP.
*	vc4: Allow merging instructions with SF set where the other writes NOP.	Eric Anholt	2016-11-30	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	I'm not sure how I managed to write the SF merge code (7d8b79f398f18ed7bb48a74b1b82950e2f08abad) without allowing merges with NOPs. Everything we try to merge with will have a NOP on one or the other side of the instruction, and that's why that commit showed no benefit. total instructions in shared programs: 99347 -> 95128 (-4.25%) instructions in affected programs: 91906 -> 87687 (-4.59%) 3DMMES performance +2.57105% +/- 0.135276% (n=6,8)
*	vc4: In a loop break/continue, jump if everyone has taken the path.	Eric Anholt	2016-11-30	1	-10/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This should be a win for most loops, which tend to have uniform control flow. More importantly, it exposes important information to live variables: that the break/continue here means that our jump target may have access to values that were live on our input. Previously, we were just setting the exec mask and letting control flow fall through, so an intervening def between the break and the end of the loop would appear to live variables as if it screened off the variable, when it didn't actually. Fixes a regression in glsl-vs-loop-redundant-condition.shader_test when a perturbing of register allocation caused a live variable to get stomped. Cc: 13.0 <[email protected]>
*	anv: expose support for VK_KHR_sampler_mirror_clamp_to_edge	Ilia Mirkin	2016-11-30	1	-0/+4
\| \| \| \| \| \| \|	This is already supported in genX_state.c, expose the extension string. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	anv/cmd_buffer: Actually use the stencil dimension	Jason Ekstrand	2016-11-30	1	-1/+1
\| \| \| \| \| \| \| \|	In an attempt to fix 3DSTATE_DEPTH_BUFFER for stencil-only cases, I accidentally kept setting the SurfaceType to 2D in the stencil-only case thanks to a copy+paste error. Reviewed-by: Nanley Chery <[email protected]>
*	swr: add streamout buffer offset into pBuffer pointer	Ilia Mirkin	2016-11-30	1	-2/+3
\| \| \| \| \| \| \| \|	The buffer_size does not take the offset into account. Just add the offset into the pointer which lines up the structures much better. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
*	swr: fix assertion for max number of so targets	Ilia Mirkin	2016-11-30	1	-1/+1
\| \| \| \| \| \| \|	The number has to be less than or equal to the max, not just less than. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
*	swr: properly report max number of SO components	Ilia Mirkin	2016-11-30	1	-1/+1
\| \| \| \| \| \| \| \|	The components count the number of individual values, not the number of slots. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
*	swr: turn off queries around blits	Ilia Mirkin	2016-11-30	1	-1/+9
\| \| \| \| \|	Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
*	swr: don't advertise stream pause/resume	Ilia Mirkin	2016-11-30	1	-1/+1
\| \| \| \| \| \| \| \| \|	There is no support for resuming streamout. Furthermore, this also controls glDrawTransformFeedback functionality which requires the same ability to query how many primitives were sent out of TF. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
*	swr: fix range computation for instanced client-side arrays	Ilia Mirkin	2016-11-30	2	-24/+52
\| \| \| \| \| \| \| \| \| \| \|	We need to take the instance divisor and number of instances into account for instanced client-side arrays, rather than the vertex parameters. Loosely based on the comparable nvc0 logic. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
*	swr: [rasterizer memory] assert when trying to convert an unknown format	Ilia Mirkin	2016-11-30	1	-0/+1
\| \| \| \| \|	Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
*	swr: remove warning about multi-layer surfaces	Ilia Mirkin	2016-11-30	1	-4/+0
\| \| \| \| \| \| \| \| \| \|	We now support clearing these, and actually rendering to multiple layers would require GS support, which will fail in much more spectacular ways for now. Once that is hooked up, there won't be anything else to do here. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
*	swr: [rasterizer core] don't attempt to load another RTAI when storing	Ilia Mirkin	2016-11-30	1	-1/+1
\| \| \| \| \| \| \| \| \|	Since we don't pass a renderTargetArrayIndex in, and the current hot tile may be for a different index, we may end up loading the RTAI=0 into the hot tile for no reason. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
*	radeonsi: document a CP DMA bug that doesn't need a workaround yet	Marek Olšák	2016-12-01	1	-1/+5
\| \| \| \| \| \|	This one is easy to miss, because it's not documented in any internal doc. Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: apply the double EVENT_WRITE_EOP workaround to VI as well	Marek Olšák	2016-12-01	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Internal docs don't mention it, but they also don't mention that the bug has been fixed (like other CI bugs fixed in VI). Vulkan does this too. v2: also update r600_gfx_write_fence_dwords Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> (v1)
*	radeonsi: add a tess+GS hang workaround for VI dGPUs	Marek Olšák	2016-12-01	1	-2/+10
\| \| \| \| \| \| \|	ported from Vulkan Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: don't apply the Z export bug workaround to Hainan	Marek Olšák	2016-12-01	1	-2/+3
\| \| \| \| \| \|	not needed Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: apply a tessellation bug workaround for SI	Marek Olšák	2016-12-01	1	-0/+7
\| \| \| \| \|	Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: apply a TC L1 write corruption workaround for SI	Marek Olšák	2016-12-01	1	-11/+23
\| \| \| \| \|	Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: apply a multi-wave workgroup SPI bug workaround to affected CIK chips	Marek Olšák	2016-12-01	4	-4/+29
\| \| \| \| \| \| \|	All codepaths are handled except for clover. Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: consolidate max-work-group-size computation	Marek Olšák	2016-12-01	1	-24/+19
\| \| \| \| \| \| \|	The next commit will need this. Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	mesa: reset linked_stages bitmask when re-linking	Timothy Arceri	2016-12-01	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	34953f8907fdd added this bitmask but it wasn't being reset when a program was relinked. If a stage was removed from the new program then it could case a crash as we expect the linked shader for that stage to not be null. Fixes crashes in: ESEXT-CTS.tessellation_shader.single.xfb_captures_data_from_correct_stage ES31-CTS.core.tessellation_shader.single.xfb_captures_data_from_correct_stage Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98917
*	freedreno/a5xx: fix negative branches	Rob Clark	2016-11-30	2	-1/+6
\| \| \| \| \| \| \| \|	Looks like immed branch offset size increased again.. making what we think is a small negative number look to hw like a huge positive number. And things go badly when shader tries to jump to hyperspace. Signed-off-by: Rob Clark <[email protected]>
*	freedreno: fix android build with a5xx	Rob Clark	2016-11-30	1	-0/+1
\| \| \| \| \| \| \| \| \|	Android doesn't build all the files that normal linux/autotools build does (mainly standalond ir3_compiler).. but possibly we should pull C_SOURCES + aNxx_SOURCES into a single variable picked up by both Android.mk and Makefile.am? (Suggested by Rob H.) Signed-off-by: Rob Clark <[email protected]>
*	freedreno/a5xx: fix discard	Rob Clark	2016-11-30	1	-3/+4
\| \| \| \|	Signed-off-by: Rob Clark <[email protected]>
*	anv: Prefer in-tree headers to out-of-tree headers	Ville Syrjälä	2016-11-30	1	-5/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Set the include paths to consider in-tree headers before out-of-tree headers. Avoids the build failing due to stale headers being present in $prefix. Previosuly 'make -ki install' or something similar was required to update the out-of-tree headers to allow the build to succeed. Also avoids having to rebuild the entire thing after every 'make install'. Cc: Rob Clark <[email protected]> Cc: Jason Ekstrand <[email protected]> Signed-off-by: Ville Syrjälä <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	freedreno/a5xx: initial support	Rob Clark	2016-11-30	34	-18/+4471
\| \| \| \|	Signed-off-by: Rob Clark <[email protected]>
*	freedreno: update generated headers	Rob Clark	2016-11-30	10	-100/+4125
\| \| \| \| \| \|	Pull in a5xx Signed-off-by: Rob Clark <[email protected]>
*	freedreno: make gmem tile size alignment configurable	Rob Clark	2016-11-30	3	-8/+17
\| \| \| \| \| \| \|	a5xx seems to prefer 64 pixel alignment, in at least some cases. Make this configurable per generation. Signed-off-by: Rob Clark <[email protected]>
*	freedreno/ir3: don't offset inloc by 8	Rob Clark	2016-11-30	4	-27/+15
\| \| \| \| \| \| \| \| \|	On a3xx/a4xx, the SP_VS_VPC_DST_REG.OUTLOCn is offset by 8, so we used to add this offset into fs->inputs[n].inloc. But a5xx drops this extra offset-by-8. So instead make inloc zero based and add the offset when we emit OUTLOCn values (for the gen's that need the offset). Signed-off-by: Rob Clark <[email protected]>
*	freedreno/a3xx: use new shader linkage helper	Rob Clark	2016-11-30	1	-27/+16
\| \| \| \|	Signed-off-by: Rob Clark <[email protected]>
*	freedreno/a4xx: use new shader linkage helper	Rob Clark	2016-11-30	1	-27/+16
\| \| \| \|	Signed-off-by: Rob Clark <[email protected]>
*	freedreno/ir3: add new helper for shader linkage	Rob Clark	2016-11-30	1	-0/+47
\| \| \| \| \| \| \|	Helps simplify things on a5xx, where pos/psize get added to the vs-out map. And anyways, simplifies a3xx and a4xx. Signed-off-by: Rob Clark <[email protected]>
*	st/mesa: skip lower_output_reads when possible	Nicolai Hähnle	2016-11-30	1	-1/+2
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	st/glsl_to_tgsi: swizzle PROGRAM_OUTPUTs correctly in src_register translation	Nicolai Hähnle	2016-11-30	1	-1/+11
\| \| \| \| \| \| \|	This is required for reading directly from fragment shader stencil and depth outputs. Reviewed-by: Marek Olšák <[email protected]>
*	gallium: add PIPE_CAP_TGSI_CAN_READ_OUTPUTS	Nicolai Hähnle	2016-11-30	17	-0/+18
\| \| \| \| \| \| \| \| \| \| \|	Drivers that support this benefit by saving one lowering pass in the GLSL-to-TGSI conversion. radeonsi already supports this because all outputs are stored in temporary variables before the export (except for TCS outputs, which have always been readable in TGSI anyway due to their special semantics). Reviewed-by: Marek Olšák <[email protected]>
*	ac/nir: Fix out of bounds array access.	Bas Nieuwenhuizen	2016-11-30	1	-1/+1
\| \| \| \| \| \| \|	With nir_intrinsic_ssbo_atomic_comp_swap we run out of params. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
*	aubinator: Add support for enum types	Kristian H. Kristensen	2016-11-29	2	-40/+93
\| \| \| \| \|	Signed-off-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	intel/genxml: Fix ksp for INTERFACE_DESCRIPTOR_DATA	Kristian H. Kristensen	2016-11-29	2	-4/+2
\| \| \| \| \| \| \| \| \| \|	This one was split across two dwords as "Kernel Start Pointer" and "Kernel Start Pointer High", which looks like it works when the driver only accesses "Kernel Start Pointer". This breaks, of course, with BO offsets > 4G. Signed-off-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	intel/genxml: Use enum 3D_Logic_Op_Function where applicable	Kristian H. Kristensen	2016-11-29	5	-56/+62
\| \| \| \| \|	Signed-off-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	intel/genxml: Use blend function and factor enums where applicable	Kristian H. Kristensen	2016-11-29	5	-130/+124
\| \| \| \| \|	Signed-off-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	intel/genxml: Use enum 3D_Vertex_Component_Control where applicable	Kristian H. Kristensen	2016-11-29	5	-20/+20
\| \| \| \| \|	Signed-off-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	intel/genxml: Use enum 3D_Stencil_Operation where applicable	Kristian H. Kristensen	2016-11-29	5	-84/+63
\| \| \| \| \|	Signed-off-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	intel/genxml: Use enum SURFACE_FORMAT where applicable	Kristian H. Kristensen	2016-11-29	5	-10/+10
\| \| \| \| \|	Signed-off-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	intel/genxml: Use enum 3D_Prim_Topo_Type where applicable	Kristian H. Kristensen	2016-11-29	5	-15/+15
\| \| \| \| \|	Signed-off-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	intel/genxml: Use 3D_Compare_Function for gen8+ test functions	Kristian H. Kristensen	2016-11-29	2	-8/+8
\| \| \| \| \| \| \| \| \|	When the state fields where shuffled around for gen8, the compare function enums were downgraded to just uints. Change them to enum 3D_Compare_Function. Signed-off-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	intel/genxml: Emit genxml enums as C enums	Kristian H. Kristensen	2016-11-29	1	-4/+4
\| \| \| \| \| \| \| \| \|	The previous commits got rid of any clashes between #defines and enum values and we can now emit the genxml enums as debugger friendly C enums. Signed-off-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>