mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	radeonsi: use 32_AR for alpha-to-coverage without a color buffer	Marek Olšák	2016-01-22	1	-1/+1
\| \| \| \| \| \|	This avoids the fp16 packing instructions. Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: add shader conversion code for all SPI color formats	Marek Olšák	2016-01-22	2	-14/+140
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: set CB_SHADER_MASK according to SPI color formats	Marek Olšák	2016-01-22	1	-16/+35
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: use SPI_SHADER_COL_FORMAT fields instead of export_16bpc	Marek Olšák	2016-01-22	7	-38/+91
\| \| \| \| \| \| \| \| \| \| \| \| \|	This does change the behavior slightly: If a shader writes COLOR[i] and that color buffer isn't bound, the shader will export MRT_NULL instead and discard the IR tree that calculates the output. The only exception is alpha-to-coverage, which requires an alpha export. v2: - update a comment about 16BPC - account for MRTZ when when fixing alpha-test/kill Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: don't enable blending if colormask == 0	Marek Olšák	2016-01-22	1	-0/+3
\| \| \| \| \| \|	most likely useless, but doesn't hurt Reviewed-by: Nicolai Hähnle <[email protected]>
*	freedreno/a4xx: Add support for adreno 430	cstout	2016-01-21	1	-0/+1
\| \| \| \|	Signed-off-by: Rob Clark <[email protected]>
*	freedreno: make opc array static const	Christian Gmeiner	2016-01-21	1	-1/+1
\| \| \| \| \|	Signed-off-by: Christian Gmeiner <[email protected]> Signed-off-by: Rob Clark <[email protected]>
*	freedreno: implement emit_string_marker	Rob Clark	2016-01-21	2	-1/+28
\| \| \| \| \| \|	Writes string to cmdstream in payload of a no-op packet. Signed-off-by: Rob Clark <[email protected]>
*	gallium: add GREMEDY_string_marker	Rob Clark	2016-01-21	13	-0/+13
\| \| \| \| \| \| \| \| \| \|	Since the GREMEDY extensions are normally only exposed by the gremedy debugger (and could possibly trigger debug paths in the app), we don't expose the extension by default, but instead only with ST_DEBUG=gremedy. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	r600g: don't leak driver const buffers	Grazvydas Ignotas	2016-01-21	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \|	The buffers are referenced from r600_update_driver_const_buffers() -> r600_set_constant_buffer() -> u_upload_data(), but nothing ever releases the reference. Similar case with driver_consts. Found using valgrind. Signed-off-by: Grazvydas Ignotas <[email protected]> Cc: <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	nv50/ir: 64-bit splitting fixes	Ilia Mirkin	2016-01-20	1	-1/+2
\| \| \| \| \| \| \|	Take reading shader outputs into account, and use setFlagsDef for the carry since we rely on having i->flagsDef being set. Signed-off-by: Ilia Mirkin <[email protected]>
*	gk110/ir: allow carry to be set/read by imad	Ilia Mirkin	2016-01-20	1	-0/+4
\| \| \| \|	Signed-off-by: Ilia Mirkin <[email protected]>
*	gm107/ir: add carry emission to LOP and IADD	Ilia Mirkin	2016-01-20	1	-0/+4
\| \| \| \|	Signed-off-by: Ilia Mirkin <[email protected]>
*	gm107/ir: add ATOM and CCTL support	Ilia Mirkin	2016-01-20	1	-0/+52
\| \| \| \|	Signed-off-by: Ilia Mirkin <[email protected]>
*	gm107/ir: set LD/ST address width bit	Ilia Mirkin	2016-01-20	1	-0/+2
\| \| \| \|	Signed-off-by: Ilia Mirkin <[email protected]>
*	gk110/ir: fix double-wide vm address	Ilia Mirkin	2016-01-20	1	-0/+4
\|
*	gk110/ir: add OP_CCTL handling	Ilia Mirkin	2016-01-20	1	-0/+35
\|
*	gk110/ir: add atomic op emission, fix gmem loads	Ilia Mirkin	2016-01-20	1	-5/+65
\| \| \| \|	Signed-off-by: Ilia Mirkin <[email protected]>
*	llvmpipe: warn about illegal use of objects in different contexts	Roland Scheidegger	2016-01-21	3	-1/+32
\| \| \| \| \| \| \| \| \| \| \|	Doing that is clearly a bug. We can't quite assert as st/mesa may hit this, but increase at least visibility of it a bit. (For the non-refcounted objects it would be illegal too, but we can't detect that unless we'd store the context ourselves. Plus, those don't tend to cause random crashes at context or object destruction time... So just sampler views, surfaces and so targets for now.) Reviewed-by: Jose Fonseca <[email protected]>
*	llvmpipe,i915: add back NEW_RASTERIZER dependency when computing vertex info	Roland Scheidegger	2016-01-21	2	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I removed this mistakenly in 2dbc20e45689e09766552517a74e2270e49817b5. I actually thought it should not be necessary and a piglit run didn't show any differences, but this shouldn't have been in there. draw_prepare_shader_outputs() is in fact dependent on NEW_RASTERIZER. The new polygon-mode-facing test indeed shows why this is necessary, there's lots of invalid reads and writes with valgrind (also crashes without valgrind), because the pre-pipeline vertex size doesn't match the post-pipeline vertex size (note this won't help much with stages which don't have the prepare hook which can grow the vertex size, in particular the wide point stage, but this isn't used by llvmpipe). The test still won't pass, of course, but it is only usage of uninitialized values now, which is much less dangerous... (Albeit I'm pretty sure for i915 it really is not needed anymore as it doesn't care about the extra outputs and doesn't call draw_prepare_shader_outputs().) Reviewed-by: Jose Fonseca <[email protected]>
*	nv50/ir: don't flip SHL(ADD) into ADD(SHL) if ADD sources have modifiers	Ilia Mirkin	2016-01-20	1	-0/+2
\| \| \| \| \|	Fixes: 31fde8fa (nv50/ir: flip shl(add, imm) into add(shl, imm)) Signed-off-by: Ilia Mirkin <[email protected]>
*	gk110/ir: fix load from shared memory	Ilia Mirkin	2016-01-20	1	-1/+1
\| \| \| \| \| \|	It was accidentally using the store opcode. Signed-off-by: Ilia Mirkin <[email protected]>
*	gk110/ir: add partial BAR support	Ilia Mirkin	2016-01-20	1	-2/+18
\| \| \| \| \| \|	This is enough for the plain TGSI BARRIER implementation. Signed-off-by: Ilia Mirkin <[email protected]>
*	llvmpipe: turn depth clears into full depth/stencil clears for d24x8 formats	Roland Scheidegger	2016-01-20	1	-11/+14
\| \| \| \| \| \| \| \| \| \| \| \|	If we have a d24x8 format, there is no stencil. Therefore, we can always clear these bits too, which means this will be some kind of memset rather than read-modify-write. This is good for some 7% increase or so in gears with huge window size - seems to have a bigger effect if things aren't in caches. Of course, any real app won't spend nearly as much time comparatively in clearing depth buffer in the first place, so the speedup will be much lower. Reviewed-by: Jose Fonseca <[email protected]>
*	nv50/ir: swap the least-ref'd source into src1 when both const/imm	Ilia Mirkin	2016-01-18	1	-10/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The whole point of inlining sources is to reduce loads. We can end up in a situation where one value is used a lot of times, and one value is used only once per instruction. The once-per-instruction one is the one that should get inlined, but with the previous algorithm, it was given no preference. This flips things around to preferring putting less-referenced values into src1 which increases the likelihood of them being inlined. While we're at it, adjust the heuristic to not treat 0 as an immediate, as well as (effectively) check for situations where LIMMs can't be loaded. All this yields improvements on nvc0: total instructions in shared programs : 6261157 -> 6255985 (-0.08%) total gprs used in shared programs : 945082 -> 943417 (-0.18%) total local used in shared programs : 30372 -> 30288 (-0.28%) total bytes used in shared programs : 50089256 -> 50047880 (-0.08%) local gpr inst bytes helped 21 822 3332 3332 hurt 0 278 565 565 And more importantly avoids generating really bad code with SSBOs, where we end up checking a lot of different values (usually immediates) against the length. On nv50 we get comparable results, and even improve packing (bytes went down more than instructions): total instructions in shared programs : 6346564 -> 6341277 (-0.08%) total gprs used in shared programs : 728719 -> 725131 (-0.49%) total local used in shared programs : 3552 -> 3552 (0.00%) total bytes used in shared programs : 43995688 -> 43932928 (-0.14%) local gpr inst bytes helped 0 1380 3252 3774 hurt 0 287 1710 1365 Signed-off-by: Ilia Mirkin <[email protected]>
*	freedreno/a4xx: use smaller threadsize for more registers	Rob Clark	2016-01-18	1	-2/+5
\| \| \| \| \| \| \| \|	Once we go past half of the "GPR" register file, it seems like we need to run frag shader with smaller threadsize. (The vertex shader already runs at TWO_QUADS, which is the minimum.) Signed-off-by: Rob Clark <[email protected]>
*	freedreno: per-generation OUT_IB packet	Rob Clark	2016-01-18	9	-6/+43
\| \| \| \| \| \| \| \| \| \|	Some a4xx firmware doesn't implement the "PFD" (prefetch-disabled) version of the CP_INDIRECT_BUFFER packet. So allow for PFD vs PFE per generation. Switch a3xx and a4xx over to using prefetch-enabled version (which is also what blob does.. it seems only on a2xx we cannot use PFE). Signed-off-by: Rob Clark <[email protected]>
*	gallium/radeon: Rename do_invalidate_resource to invalidate_buffer	Michel Dänzer	2016-01-18	1	-4/+6
\| \| \| \| \| \| \|	And only call it from r600_invalidate_resource for buffer resources. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: Avoid warning about LLVM generating R_0286D0_SPI_PS_INPUT_ADDR	Michel Dänzer	2016-01-18	1	-0/+3
\| \| \| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
*	radeonsi: Print "LLVM emitted unknown config register" warning only once	Michel Dänzer	2016-01-18	1	-2/+9
\| \| \| \| \| \| \|	Say "LLVM" instead of "Compiler" for clarity. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	gm107/ir: don't do indirect frag shader inputs on GM107	Ilia Mirkin	2016-01-17	1	-0/+1
\| \| \| \| \| \| \| \|	Apparently the IPA op decided to stop working with offsets. Need to figure out if we need to do an AL2P situation or something similar. For now just turn it back off. Signed-off-by: Ilia Mirkin <[email protected]>
*	nvc0: bsp_bo can't be null	Ilia Mirkin	2016-01-17	1	-1/+1
\| \| \| \| \| \| \|	We already deref it earlier. And these are all allocated on load. Spotted by Coverity. Signed-off-by: Ilia Mirkin <[email protected]>
*	llvmpipe: fix arguments order given to vec_andc	Oded Gabbay	2016-01-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes a classic "confuse the enemy" bug. _mm_andnot_si128 (SSE) and vec_andc (VMX) do the same operation, but the arguments are opposite. _mm_andnot_si128 performs "r = (~a) & b" while vec_andc performs "r = a & (~b)" To make sure this error won't return in another place, I added a wrapper function, vec_andnot_si128, in u_pwr8.h, which makes the swap inside. Signed-off-by: Oded Gabbay <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
*	freedreno/ir3: fix mad 3rd src delay calc	Rob Clark	2016-01-17	1	-1/+1
\| \| \| \| \| \| \|	In fad158a0 ("freedreno/ir3: array rework") the src # (n) shifted by one, but missed updating delay-slot calc. Signed-off-by: Rob Clark <[email protected]>
*	freedreno/ir3: better array register allocation	Rob Clark	2016-01-16	2	-9/+51
\| \| \| \| \| \| \|	Detect arrays which don't conflict with each other and allow overlapping register allocation. Signed-off-by: Rob Clark <[email protected]>
*	freedreno/ir3: array offset can be negative	Rob Clark	2016-01-16	5	-12/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It at least happens with some piglit tests, like $piglit/bin/vp-address-01 VERT DCL IN[0] DCL IN[1] DCL OUT[0], POSITION DCL OUT[1], COLOR DCL CONST[0..7] DCL ADDR[0] 0: ARL ADDR[0].x, IN[1].xxxx 1: MOV_SAT OUT[1], CONST[ADDR[0].x-1] 2: DP4 OUT[0].x, CONST[4], IN[0] 3: DP4 OUT[0].y, CONST[5], IN[0] 4: DP4 OUT[0].z, CONST[6], IN[0] 5: DP4 OUT[0].w, CONST[7], IN[0] 6: END Signed-off-by: Rob Clark <[email protected]>
*	freedreno/ir3: workaround bug/feature	Rob Clark	2016-01-16	1	-0/+9
\| \| \| \| \| \| \| \| \| \|	Seems like in certain cases, we cannot use c<a0.x+0> as the third src to cat3 instructions. This may be slightly conservative, we may only have this restriction when the first src is also const. This fixes, for example, +24/-0 of the variable-indexing piglit tests. Signed-off-by: Rob Clark <[email protected]>
*	freedreno/ir3: array rework	Rob Clark	2016-01-16	9	-363/+365
\| \| \| \|	Signed-off-by: Rob Clark <[email protected]>
*	freedreno/ir3: refactor/simplify cp	Rob Clark	2016-01-16	1	-87/+82
\| \| \| \| \| \| \| \| \|	If we handle separately the special case of eliminating output mov (which includes keeps and various other cases where we don't have a consuming instruction's src register to collapse things into), we can simplify the logic. Signed-off-by: Rob Clark <[email protected]>
*	freedreno/ir3: fix incorrect decoding of mov instructions	Rob Clark	2016-01-16	1	-1/+1
\| \| \| \|	Signed-off-by: Rob Clark <[email protected]>
*	freedreno/ir3: remove unused tgsi tokens ptr	Rob Clark	2016-01-16	1	-1/+0
\| \| \| \|	Signed-off-by: Rob Clark <[email protected]>
*	freedreno/ir3: bit of ra refactor	Rob Clark	2016-01-16	1	-25/+20
\| \| \| \| \| \| \|	Shuffle things slightly, passing instr-data to ra_name() to reduce the number of places where we need to add support for array names. Signed-off-by: Rob Clark <[email protected]>
*	freedreno/ir3: cosmetic de-indent	Rob Clark	2016-01-16	1	-36/+34
\| \| \| \| \| \|	Collapse two nested if's into one to reduce indent level. Signed-off-by: Rob Clark <[email protected]>
*	nv50/ir: add saturate support on ex2	Ilia Mirkin	2016-01-16	2	-0/+6
\| \| \| \|	Signed-off-by: Ilia Mirkin <[email protected]>
*	llvmpipe: ditch additional ref counting for vertex/geometry sampler views	Roland Scheidegger	2016-01-15	4	-46/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The cleaning up was quite a performance hog (making pipe_resource_reference the number two in profilers on the vertex path, and 3rd overall, with its cousin pipe_reference_described not far behind) if there were lots of tiny draw calls (ipers). Now the reason was really that it was blindly calling this for all potential shader views (so 32 each for vs and gs) even though the app never touched a single one which could have been fixed, however I can't come up with a good reason why we refcount these. We've got references, of course, in the sampler views, which should be quite sufficient as we do all vertex and geometry shader execution fully synchronous. (Calling prepare_shader_sampling for all draw calls even if there were no changes looks quite suboptimal too, but generally we don't really expect vs/gs shader sampling to be used much with llvmpipe, and there's even an early exit if there aren't any views to avoid the "null loop" albeit it's now no longer always trying to loop through all 32 slots. Maybe improve another time...). Of course, if we manage to make vertex loads run asynchronously some day, we need references again, but adding that back would be the least of the problems... Also only set LP_NEW_SAMPLER_VIEW for fragment sampler views. Nothing on the vertex side depends on it (I suppose we'd really wanted a separate flag in any case). (Good for a 3% improvement or so in ipers under the right conditions.) Reviewed-by: Jose Fonseca <[email protected]>
*	llvmpipe: fix "leaking" textures	Roland Scheidegger	2016-01-15	2	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was not really a leak per se, but we were referencing the textures for longer than intended. If textures were set via llvmpipe_set_sampler_views() (for fs) and then picked up by lp_setup_set_fragment_sampler_views(), they were referenced in the setup state. However, the only way to unreference them was by replacing them with another texture, and not when the texture slot was replaced with a NULL sampler view. (They were then further also referenced by the scene too which might have additional minor side effects as we limit the memory size which is allowed to be referenced by a scene in a rather crude way.) Only setup destruction (at context destruction time) then finally would get rid of the references. Fix this by noting the number of textures the last time, and unreference things if the new view is NULL (avoiding having to unreference things always up to PIPE_MAX_SHADER_SAMPLER_VIEWS which would also have worked). Found by code inspection, no test... v2: rename var Reviewed-by: Jose Fonseca <[email protected]>
*	nv50/ir: rebase indirect temp arrays to 0, so that we use less lmem space	Ilia Mirkin	2016-01-14	1	-14/+44
\| \| \| \| \| \| \| \| \| \| \| \| \|	Reduces local memory usage in a lot of Metro 2033 Redux and a few KSP shaders: total local used in shared programs : 54116 -> 30372 (-43.88%) Probably modest advantage to execution, but it's an imporant prerequisite to dropping some of the TGSI optimizations done by the state tracker. Signed-off-by: Ilia Mirkin <[email protected]>
*	nv50/ir: only use FILE_LOCAL_MEMORY for temp arrays that use indirection	Ilia Mirkin	2016-01-14	1	-15/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously we were treating any indirect temp array usage to mean that everything should end up in lmem. The MemoryOpt pass would clean a lot of that up later, but in the meanwhile we would lose a lot of opportunity for optimization. This helps a lot of Metro 2033 Redux and a handful of KSP shaders: total instructions in shared programs : 6288373 -> 6261517 (-0.43%) total gprs used in shared programs : 944051 -> 945131 (0.11%) total local used in shared programs : 54116 -> 54116 (0.00%) A typical case is for register usage to double and for instructions to halve. A future commit can also optimize local memory usage size to be reduced with better packing. Signed-off-by: Ilia Mirkin <[email protected]>
*	nvc0/ir: be careful about propagating very large offsets into const load	Ilia Mirkin	2016-01-14	4	-1/+19
\| \| \| \| \| \| \| \| \| \| \| \| \|	Indirect constbuf indexing works by using very large offsets. However if an indirect constbuf index load is const-propagated, it becomes a very large const offset. Take that into account when legalizing the SSA by moving the high parts of that offset into the file index. Also disallow very large (or small) indices on most other instructions. This fixes regressions in ubo_array_indexing/*-two-arrays piglit tests. Fixes: abd326e81b (nv50/ir: propagate indirect loads into instructions) Signed-off-by: Ilia Mirkin <[email protected]>
*	nvc0: allow fragment shader inputs to use indirect indexing	Ilia Mirkin	2016-01-14	1	-1/+1
\| \| \| \|	Signed-off-by: Ilia Mirkin <[email protected]>