mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	freedreno/a5xx: fix negative branches	Rob Clark	2016-11-30	2	-1/+6
\| \| \| \| \| \| \| \|	Looks like immed branch offset size increased again.. making what we think is a small negative number look to hw like a huge positive number. And things go badly when shader tries to jump to hyperspace. Signed-off-by: Rob Clark <[email protected]>
*	freedreno: fix android build with a5xx	Rob Clark	2016-11-30	1	-0/+1
\| \| \| \| \| \| \| \| \|	Android doesn't build all the files that normal linux/autotools build does (mainly standalond ir3_compiler).. but possibly we should pull C_SOURCES + aNxx_SOURCES into a single variable picked up by both Android.mk and Makefile.am? (Suggested by Rob H.) Signed-off-by: Rob Clark <[email protected]>
*	freedreno/a5xx: fix discard	Rob Clark	2016-11-30	1	-3/+4
\| \| \| \|	Signed-off-by: Rob Clark <[email protected]>
*	freedreno/a5xx: initial support	Rob Clark	2016-11-30	33	-17/+4470
\| \| \| \|	Signed-off-by: Rob Clark <[email protected]>
*	freedreno: update generated headers	Rob Clark	2016-11-30	10	-100/+4125
\| \| \| \| \| \|	Pull in a5xx Signed-off-by: Rob Clark <[email protected]>
*	freedreno: make gmem tile size alignment configurable	Rob Clark	2016-11-30	3	-8/+17
\| \| \| \| \| \| \|	a5xx seems to prefer 64 pixel alignment, in at least some cases. Make this configurable per generation. Signed-off-by: Rob Clark <[email protected]>
*	freedreno/ir3: don't offset inloc by 8	Rob Clark	2016-11-30	4	-27/+15
\| \| \| \| \| \| \| \| \|	On a3xx/a4xx, the SP_VS_VPC_DST_REG.OUTLOCn is offset by 8, so we used to add this offset into fs->inputs[n].inloc. But a5xx drops this extra offset-by-8. So instead make inloc zero based and add the offset when we emit OUTLOCn values (for the gen's that need the offset). Signed-off-by: Rob Clark <[email protected]>
*	freedreno/a3xx: use new shader linkage helper	Rob Clark	2016-11-30	1	-27/+16
\| \| \| \|	Signed-off-by: Rob Clark <[email protected]>
*	freedreno/a4xx: use new shader linkage helper	Rob Clark	2016-11-30	1	-27/+16
\| \| \| \|	Signed-off-by: Rob Clark <[email protected]>
*	freedreno/ir3: add new helper for shader linkage	Rob Clark	2016-11-30	1	-0/+47
\| \| \| \| \| \| \|	Helps simplify things on a5xx, where pos/psize get added to the vs-out map. And anyways, simplifies a3xx and a4xx. Signed-off-by: Rob Clark <[email protected]>
*	gallium: add PIPE_CAP_TGSI_CAN_READ_OUTPUTS	Nicolai Hähnle	2016-11-30	15	-0/+15
\| \| \| \| \| \| \| \| \| \| \|	Drivers that support this benefit by saving one lowering pass in the GLSL-to-TGSI conversion. radeonsi already supports this because all outputs are stored in temporary variables before the export (except for TCS outputs, which have always been readable in TGSI anyway due to their special semantics). Reviewed-by: Marek Olšák <[email protected]>
*	swr: [rasterizer jit] use signed integer representation for logic op	Ilia Mirkin	2016-11-29	1	-5/+12
\| \| \| \| \| \| \| \| \| \|	Instead of (incorrectly) biasing the snorm value to make it look like a unorm, just use signed integer math. This fixes arb_color_buffer_float-render GL_RGBA8_SNORM Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
*	swr: add missing rgbx8_srgb variant	Ilia Mirkin	2016-11-29	1	-0/+1
\| \| \| \| \|	Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
*	swr: reorder renderable formats, add grouping comments	Ilia Mirkin	2016-11-29	1	-65/+87
\| \| \| \| \|	Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
*	swr: use util_copy_framebuffer_state helper	Ilia Mirkin	2016-11-29	1	-12/+1
\| \| \| \| \|	Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
*	swr: enable cubemap arrays	Ilia Mirkin	2016-11-29	1	-1/+1
\| \| \| \| \| \| \|	Everything is in place for these. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
*	swr: rearrange caps into limits/supported/unsupported groups	Ilia Mirkin	2016-11-29	1	-129/+84
\| \| \| \| \| \| \| \| \| \|	I find this a lot more readable and compact - much easier to scan through the list and see what's on and what's off. No functional change intended. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
*	swr: only store up to the LOD size	Ilia Mirkin	2016-11-29	1	-1/+3
\| \| \| \| \|	Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
*	swr: [rasterizer common] add SwrTrace() and macros	Tim Rowley	2016-11-29	2	-15/+95
\| \| \| \|	Reviewed-by: Bruce Cherniak <[email protected]>
*	radeonsi: don't fetch 8 dwords for samplerBuffer and imageBuffer	Marek Olšák	2016-11-29	1	-51/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The compiler doesn't shrink s_load_dwordx8, so we always wasted 4 SGPRs. Also, the extraction of the descriptor created some really ugly asm code with lots of VALU bitwise ops and v_readfirstlane. Totals from affected shaders: SGPRS: 13880 -> 13253 (-4.52 %) VGPRS: 15200 -> 15088 (-0.74 %) Code Size: 499864 -> 459816 (-8.01 %) bytes Max Waves: 1554 -> 1564 (0.64 %) Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: disable XNACK to free 2 SGPRs on APUs	Marek Olšák	2016-11-29	1	-1/+1
\| \| \| \| \| \|	My LLVM commit disables it for dGPUs, but not APUs. Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: count and report temp arrays in scratch separately	Marek Olšák	2016-11-29	2	-4/+40
\| \| \| \| \| \|	v2: only do this if debug output of shader dumping is enabled Reviewed-by: Nicolai Hähnle <[email protected]> (v1)
*	radeonsi: don't try to eliminate trivial VS outputs for PS and CS	Marek Olšák	2016-11-29	1	-1/+4
\| \| \| \| \| \|	PS and CS don't have any param exports, so it's a no-op. Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: disable RB+ blend optimizations for dual source blending	Marek Olšák	2016-11-29	1	-0/+11
\| \| \| \| \| \| \| \|	This fixes dual source blending on Stoney. The fix was copied from Vulkan. The problem was discovered during internal testing. Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: set CB_BLEND1_CONTROL.ENABLE for dual source blending	Marek Olšák	2016-11-29	1	-0/+4
\| \| \| \| \| \| \|	copied from Vulkan Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: always set all blend registers	Marek Olšák	2016-11-29	1	-5/+5
\| \| \| \| \| \| \|	better safe than sorry Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: set the smallest possible CB_TARGET_MASK	Marek Olšák	2016-11-29	1	-5/+5
\| \| \| \| \| \|	better safe than sorry; set_framebuffer_state always makes this dirty Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: don't print bodies of header-only packets	Marek Olšák	2016-11-29	1	-0/+4
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: print unknown registers with correct formatting	Marek Olšák	2016-11-29	1	-1/+2
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	ddebug: fix hang detection with deferred flushes	Marek Olšák	2016-11-29	1	-1/+1
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	vc4: Add a note for the future about texture latency calculation.	Eric Anholt	2016-11-29	1	-0/+20
\| \| \| \| \| \| \|	Debugging a shader-db reported cycle count regression from the tex coalescing, I eventually figured out that the texture latencies were totally bogus. Really fixing it will probably involve mirroring vc4_qir_schedule.c's texture fifo management here.
*	vc4: Add support for coalescing ALU ops into tex_[srtb] MOVs.	Eric Anholt	2016-11-29	4	-29/+37
\| \| \| \| \| \| \| \| \| \| \|	This isn't as complete as I would like (can't merge interpolation because of the implicit r5 dependency, doesn't work with control flow), but this was cheap and easy. Improves 3DMMES Taiji performance by 1.15353% +/- 0.299896% (n=29, 16) total instructions in shared programs: 99810 -> 99059 (-0.75%) instructions in affected programs: 10705 -> 9954 (-7.02%)
*	vc4: Restructure VPM write optimization into two passes.	Eric Anholt	2016-11-29	1	-18/+10
\| \| \| \| \|	For texturing, there won't be a fixed limit on how many writes there are, so we need to compute uses up front.
*	vc4: Make qir_for_each_inst_inorder() safe against removal.	Eric Anholt	2016-11-29	1	-1/+1
\| \| \| \| \|	The dead code elimination wants it to be safe, and I actually got segfaults due to it being unsafe with the new coalescing pass.
*	vc4: Split optimizing VPM writes from VPM reads.	Eric Anholt	2016-11-29	5	-51/+110
\| \| \| \| \| \|	The VPM write logic will be basically the same as the texture coordinate write logic we need, and it's not really related to the VPM read logic other than the reuse of the use_count array.
*	vc4: Restructure texture insts as ALU ops with tex_[strb] as the dst.	Eric Anholt	2016-11-29	9	-89/+194
\| \| \| \| \|	For now we're still just generating MOVs, but this will let us fold into other ops in the future. No difference on shader-db.
*	vc4: Refactor qir_get_op_nsrc(enum qop) to qir_get_nsrc(struct qinst *).	Eric Anholt	2016-11-29	17	-36/+34
\| \| \| \| \| \|	Every caller was dereffing the qinst, and this will let us make the number of sources vary depending on the destination of the qinst so that we can have general ALU ops that store to tex_[strb] and get an implicit uniform.
*	vc4: Replace the qinst src[] with a fixed-size array.	Eric Anholt	2016-11-29	3	-4/+2
\| \| \| \| \| \|	This may have made a tiny bit of sense when we had one 4-arg inst per shader, but if we only ever put 2 things in, having a pointer to 2 things almost every instruction is pointless indirection.
*	vc4: Remove qir_inst4().	Eric Anholt	2016-11-29	2	-25/+0
\| \| \| \| \|	This was used originally for unorm4x8 packs, but we now represent those as a series of packed movs.
*	swr: [rasterizer memory] only clear up to the LOD size	Ilia Mirkin	2016-11-28	1	-2/+8
\| \| \| \| \|	Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
*	swr: [rasterizer memory] hook up stencil clears for ClearTile	Ilia Mirkin	2016-11-28	1	-5/+8
\| \| \| \| \|	Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
*	swr: [rasterizer memory] add support for clearing Z32F_X32 and Z16	Ilia Mirkin	2016-11-28	1	-0/+2
\| \| \| \| \|	Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
*	swr: don't clear all dirty bits when changing so targets	Ilia Mirkin	2016-11-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Among other things, blits would clear existing SO targets which would cause a bunch of updates from u_blitter to be missed. Fixes fbo-scissor-blit fbo, probably among many others. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
*	swr: [rasterizer core] fix typo in scissor tile-alignment logic	Ilia Mirkin	2016-11-28	1	-1/+1
\| \| \| \| \|	Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
*	freedreno: fix slice size for imported buffers	Rob Clark	2016-11-27	1	-0/+1
\| \| \| \|	Signed-off-by: Rob Clark <[email protected]>
*	freedreno/a3xx: make _emit_const() static	Rob Clark	2016-11-27	2	-5/+1
\| \| \| \|	Signed-off-by: Rob Clark <[email protected]>
*	freedreno/a4xx: make _emit_const() static	Rob Clark	2016-11-27	3	-6/+2
\| \| \| \|	Signed-off-by: Rob Clark <[email protected]>
*	gm107/ir: optimize 32-bit CONST load to mov	Samuel Pitoiset	2016-11-26	2	-0/+17
\| \| \| \| \| \| \| \| \|	This is not allowed for indirect accesses because the source GPR might be erased by a subsequent instruction (WaR hazard) if we don't emit a read dep bar. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	gm107/ir: do not combine CONST loads	Samuel Pitoiset	2016-11-26	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This will allow to use MOV instead of LD. The main advantage is that MOV doesn't require a read dependency barrier while LD does, and so this will both reduce barriers pressure and the number of stall counts needed to read data from constant memory. This is currently only for user uniform accesses. I should do something similar when loading from the driver constant buffer but it seems like a bit tricky to handle for now. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	scons: Recognize LLVM_CONFIG environment variable.	Vinson Lee	2016-11-24	1	-1/+2
\| \| \| \| \| \|	Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>