mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	nvc0: use correct bufctx when invalidating CP textures	Samuel Pitoiset	2016-10-25	1	-1/+1
\| \| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: "12.0 13.0" <[email protected]>
*	nv50/ir: do not perform global membar for shared memory	Samuel Pitoiset	2016-10-24	1	-1/+4
\| \| \| \| \| \| \| \| \| \|	Shared memory is local to CTA, thus we should only wait for prior memory writes which are visible to other threads in the same CTA, and not at global level. This should speedup compute shaders which use shared memory. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nv50/ir: display OP_BAR subops in debug mode	Samuel Pitoiset	2016-10-24	1	-0/+9
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nv50/ir: it appears that OP_DISCARD can't take a join modifier	Ilia Mirkin	2016-10-22	1	-0/+1
\| \| \| \| \| \| \|	nvdisasm does not print a .S even though the bit is set. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
*	nv50/ir: use levelZero for non-frag tex/txp ops	Ilia Mirkin	2016-10-22	1	-0/+5
\| \| \| \| \| \| \| \| \|	radeonsi also does the same thing. I suspect that this is likely to be a no-op in reality, but it brings nouveau code closer to what the blob produces. Plus it makes sense to not try to do auto-derivatives on this. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
*	gallium: add PIPE_CAP_STREAM_OUTPUT_INTERLEAVE_BUFFERS	Ilia Mirkin	2016-10-22	3	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows the driver to signal that it can't handle random interleaving of attributes across buffers. This is required for ARB_transform_feedback3, and it's initialized to whatever the previous value of PIPE_CAP_STREAM_OUTPUT_PAUSE_RESUME was except for nv50 where it is disabled. Note that the proprietary drivers never expose ARB_transform_feedback3 on any GT21x's (where nouveau previously did), and after some effort I was unable to get it to work. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	nvc0/ir: remove outdated comment about SHLADD	Samuel Pitoiset	2016-10-22	2	-2/+0
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nv50,nvc0: don't keep track of whether fb rt0 is integer-only	Ilia Mirkin	2016-10-21	6	-44/+22
\| \| \| \| \| \| \| \| \| \|	This reverts commits 1af0641db345209c076e9b1ba4dca7524541671a and a6ad49cbbd599aec054d0a3163fff5ad724f2b18. st/mesa adjusts the rasterizer state for us now. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
*	nvc0: do not break 3D state by pushing MS coordinates on Fermi	Samuel Pitoiset	2016-10-20	1	-43/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Long story short, 3D and CP are aliased on Fermi and initializing compute after pushing the MS sample coordinate offsets seems to corrupt 3D state for weird reasons. I still don't have the faintest clue what is going on, but this seems to only affect Fermi generation. A possible fix could be to use two different channels, one for 3D and one for CP. This fixes a bunch of regressions pinpointed by piglit. Fixes: "nvc0: fix up image support for allowing multiple samples" Cc: "13.0" <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nvc0: translate compute shaders at program creation	Samuel Pitoiset	2016-10-20	1	-0/+4
\| \| \| \| \| \| \|	This makes shader-db reports results for compute shaders. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nv50/ir: process texture offset sources as regular sources	Ilia Mirkin	2016-10-19	1	-53/+94
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With ARB_gpu_shader5, texture offsets can be any source, including TEMPs and IN's. Make sure to process them as regular sources so that we pick up masks, etc. This should fix some CTS tests that feed offsets directly to textureGatherOffset, and we were not picking up the input use, thus not advertising it in the shader header. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Tested-by: Dave Airlie <[email protected]> Cc: 12.0 13.0 <[email protected]>
*	nv50,nvc0: avoid reading out of bounds when getting bogus so info	Ilia Mirkin	2016-10-19	2	-2/+8
\| \| \| \| \| \| \| \| \|	The state tracker tries to attach the info to the wrong shader. This is easy enough to protect against. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: 12.0 13.0 <[email protected]>
*	nvc0/ir: simplify predicate logic for GK104 atomic operations	Samuel Pitoiset	2016-10-19	1	-14/+7
\| \| \| \| \| \| \| \|	The predicate is always CC_NOT_P as defined in processSurfaceCoordsNVE4(), so we only want to emit OR. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nvc0/ir: remove useless NVC0LoweringPass::gMemBase	Samuel Pitoiset	2016-10-19	1	-4/+1
\| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]>
*	nv50/ir: print CCTL subops in debug mode	Samuel Pitoiset	2016-10-19	1	-0/+9
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nv50/ir: silent TGSI_PROPERTY_FS_DEPTH_LAYOUT	Samuel Pitoiset	2016-10-19	1	-0/+1
\| \| \| \| \| \| \| \|	Found that information message while replaying a trace from Metro 2033 Redux. Mark that property as useless for now. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	gm107/ir: fix bit offset of tex lod setting for indirect texturing	Ilia Mirkin	2016-10-18	1	-1/+1
\| \| \| \| \| \|	Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
*	gm107/ir: fix texturing with indirect samplers	Ilia Mirkin	2016-10-18	1	-0/+10
\| \| \| \| \| \| \| \| \| \|	The indirect handle has to come right after the coordinates, so if there was a sample/bias/depth compare/offset, everything would end up being shifted by one argument position. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
*	nv50/ir: constant fold OP_SPLIT	Tobias Klausmann	2016-10-14	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \|	Split the source immediate value into new values and move them into the original defs set by the split. Since we can only have up to 64-bit immediates, this is largely beneficial for F64 (and, in the future, U64) operations. Signed-off-by: Tobias Klausmann <[email protected]> [imirkin: always use U32, set newi for foldCount tracking] Signed-off-by: Ilia Mirkin <[email protected]>
*	nv50: enable ARB_enhanced_layouts	Ilia Mirkin	2016-10-13	1	-1/+1
\| \| \| \|	Signed-off-by: Ilia Mirkin <[email protected]>
*	nvc0/ir: be more careful about preserving modifiers in SHLADD creation	Ilia Mirkin	2016-10-13	1	-7/+5
\| \| \| \| \| \| \| \|	src2 was being given the wrong modifier, and we were not properly managing the modifier on the SHL source either. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
*	nvc0: enable ARB_enhanced_layouts	Samuel Pitoiset	2016-10-13	1	-1/+1
\| \| \| \| \| \| \| \|	All ARB_enhanced_layouts piglit tests pass without any changes in our compiler. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nvc0/ir: fix textureGather with a single offset	Ilia Mirkin	2016-10-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	Recent fix for non-const offsets broke the case of a single offset (vs 4 offsets). The later code relies on the offs array to contain null values to tell whether they should be added onto the srcs list. Fixes: 5239bd592 ("nvc0/ir: fix overwriting of value backing non-constant gather offset") Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
*	nv50/ir: copy over value's register id when resolving merge of a phi	Ilia Mirkin	2016-10-12	1	-1/+3
\| \| \| \| \| \| \| \| \| \|	The offset needs to be properly copied over to the phi value, otherwise it will get assigned to the base of the merge instead of the proper location. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
*	gallium: add PIPE_CAP_TGSI_ARRAY_COMPONENTS	Nicolai Hähnle	2016-10-12	3	-0/+3
\| \| \| \| \| \| \| \|	This is a screen cap because drivers are expected to support it either for all shader types or for none of them. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
*	nv50/ir: optimize ADD(SHL(a, b), c) to SHLADD(a, b, c)	Samuel Pitoiset	2016-10-12	1	-0/+87
\| \| \| \| \| \| \| \| \| \| \| \| \|	total instructions in shared programs :2286901 -> 2284473 (-0.11%) total gprs used in shared programs :335256 -> 335273 (0.01%) total local used in shared programs :31968 -> 31968 (0.00%) local gpr inst bytes helped 0 41 852 852 hurt 0 44 23 23 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nvc0: fix valid range for shader buffers	Samuel Pitoiset	2016-10-10	3	-0/+3
\| \| \| \| \| \| \| \|	When offset != 0, the valid range was wrong because the second argument of util_range_add() is end, not size. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nvc0/ir: fix overwriting of value backing non-constant gather offset	Ilia Mirkin	2016-10-10	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Normally the value is an immediate, which is moved to some temporary, so there's no problem. In the case of a non-constant offset (as allowed by ARB_gpu_shader5), we have to take care to copy it first before using it to build up the bits. This fixes a compilation error observed in F1 2015. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
*	nv50/ir: only stick one preret per function	Ilia Mirkin	2016-10-10	1	-4/+7
\| \| \| \| \| \| \| \| \| \| \|	A function with multiple returns would have had multiple preret settings at the top of the function. While this is unlikely to have caused issues since we don't use functions in earnest, it could have in some cases overflowed the call stack, in case a function had a lot of early returns. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
*	nv50/ir: fix wrong check when optimizing MAD to SHLADD	Samuel Pitoiset	2016-10-07	1	-1/+1
\| \| \| \| \| \| \| \| \|	Checking if MAD is supported is definitely wrong, and it's more likely a typo I introduced few days ago which breaks NV50 because SHLADD is not supported there. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nvc0: dump program binary only when NV50_PROG_DEBUG is set	Samuel Pitoiset	2016-10-07	1	-1/+1
\| \| \| \| \| \| \| \|	When the chipset is forced with NV50_PROG_CHIPSET, we actually only want to output the binary if NV50_PROG_DEBUG is also enabled. Otherwise, this pollutes the shader-db output. Signed-off-by: Samuel Pitoiset <[email protected]>
*	nvc0: expose ARB_compute_variable_group_size	Samuel Pitoiset	2016-10-07	1	-2/+6
\| \| \| \| \| \| \| \| \|	Only expose 512 threads/block on Fermi to not be limited by 32 GPRs/thread. v4: - use 512 threads on Fermi, 1024 on Kepler+ Signed-off-by: Samuel Pitoiset <[email protected]>
*	nv50/ir: set number of threads/block for variable local size	Samuel Pitoiset	2016-10-07	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	When a variable local size is defined as specified by ARB_compute_variable_group_size, the fixed local size is set to 0 and a SIGFPE occurs when we compute the maximum number of regs. This allows to use 64 GPRs/thread. v4: - use 512 threads on Fermi, 1024 on Kepler+ Signed-off-by: Samuel Pitoiset <[email protected]>
*	gallium: add PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCK	Samuel Pitoiset	2016-10-07	2	-0/+4
\| \| \| \| \| \| \| \| \|	v3: - use a new case statement in r600_pipe_common.c - fix compilation of softpipe... Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	nv50/ir: optimize sub(a, 0) to a	Karol Herbst	2016-10-06	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	helped some ue4 demos and divinity OS shaders total instructions in shared programs : 2818674 -> 2818606 (-0.00%) total gprs used in shared programs : 379273 -> 379273 (0.00%) total local used in shared programs : 9505 -> 9505 (0.00%) total bytes used in shared programs : 25837792 -> 25837192 (-0.00%) local gpr inst bytes helped 0 0 33 33 hurt 0 0 0 0 Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Pierre Moreau <[email protected]>
*	nvc0: dump program binary when chipset has been forced	Samuel Pitoiset	2016-10-05	1	-0/+5
\| \| \| \| \| \| \| \|	Currently, program binaries are only dumped at upload time, but when the chipset has been forced via NV50_PROG_CHIPSET we might want to show the generated code, especially with shaderdb. Signed-off-by: Samuel Pitoiset <[email protected]>
*	nv50/ra: let simplify return an error and handle that	Karol Herbst	2016-10-05	1	-5/+7
\| \| \| \| \| \| \| \|	fixes a crash in the case simplify reports an error Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
*	nv50/ir: teach insnCanLoad() about SHLADD	Samuel Pitoiset	2016-09-29	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commutativity is not allowed with SHLADD, but src2 can accept loads. To allow the load propagation pass to do its job, add a special case like for SUCLAMP because src1 is always an immediate. This IMAD to SHLADD optimization helps a bunch of shaders from Tomb Raider, Victor Vran, UE4 demos (+15% perf with Elemental) and Shadow Warrior. GF100/GK104: total instructions in shared programs :2838045 -> 2834712 (-0.12%) total gprs used in shared programs :396684 -> 396386 (-0.08%) total local used in shared programs :34416 -> 34416 (0.00%) local gpr inst bytes helped 0 326 1105 1105 hurt 0 55 3 3 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nv50/ir: optimize SHLADD(a, b, c) to MOV((a << b) + c)	Samuel Pitoiset	2016-09-29	1	-0/+3
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nv50/ir: optimize SHLADD(a, b, 0x0) to SHL(a, b)	Samuel Pitoiset	2016-09-29	1	-0/+8
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nv50/ir: optimize IMAD to SHLADD in presence of power of 2	Samuel Pitoiset	2016-09-29	1	-0/+7
\| \| \| \| \| \| \|	Only and only if src1 is a power of 2 we can replace IMAD by SHLADD. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nvc0/ir: add emission for SHLADD	Samuel Pitoiset	2016-09-29	3	-0/+127
\| \| \| \| \| \| \| \|	Unfortunately, we can't use the emit helpers for GF100/GK110 because src1 and src2 are swapped. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nv50/ir: add preliminary support for SHLADD	Samuel Pitoiset	2016-09-29	5	-7/+17
\| \| \| \| \| \| \| \| \| \|	This instruction is available since SM20 (Fermi) and allow to do (a << b) + c in one shot. In some situations, IMAD should be replaced by SHLADD when b is a power of 2, and ADD+SHL should be replaced by SHLADD as well. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nvc0: update GM107 sched control codes format	Samuel Pitoiset	2016-09-29	2	-23/+23
\| \| \| \| \| \| \| \| \|	envyas now uses a much better representation for those control codes and it displays the different flags instead of an unreadable hex number. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nv50/ir: fix comments about instructions info	Samuel Pitoiset	2016-09-26	1	-2/+3
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
*	nvc0: allow to force compiling programs in debug build	Samuel Pitoiset	2016-09-26	1	-9/+10
\| \| \| \| \| \| \| \| \|	This adds a new envvar called NV50_PROG_CHIPSET which allows to compile shaders with a different target, especially useful for shader-db. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nv50/ir: drop unused NVISA_XXX_CHIPSET constants	Samuel Pitoiset	2016-09-26	1	-2/+0
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nvc0: get rid of nvc0_stage_sampler_states_bind_range()	Samuel Pitoiset	2016-09-19	1	-74/+9
\| \| \| \| \| \| \|	Same thing as nvc0_stage_set_sampler_views_range(). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nvc0: get rid of nvc0_stage_set_sampler_views_range()	Samuel Pitoiset	2016-09-19	1	-89/+15
\| \| \| \| \| \| \| \|	This function was quite similar to nvc0_stage_set_sampler_views() and I don't see any reasons to not remove it. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nv50/ir: optimize SUB(a, b) to MOV(a - b)	Samuel Pitoiset	2016-09-18	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This helps shaders in UE4 demos, especially with Elemental (+1% perf). This optimization reduces spilling usage in one shader which explains the little gain. GF100/GK104: total instructions in shared programs :2838551 -> 2838045 (-0.02%) total gprs used in shared programs :396706 -> 396684 (-0.01%) total local used in shared programs :34432 -> 34416 (-0.05%) local gpr inst bytes helped 1 19 112 112 hurt 0 0 0 0 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>