mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	nv50/ir: optimize ADD(SHL(a, b), c) to SHLADD(a, b, c)	Samuel Pitoiset	2016-10-12	1	-0/+87
\| \| \| \| \| \| \| \| \| \| \| \| \|	total instructions in shared programs :2286901 -> 2284473 (-0.11%) total gprs used in shared programs :335256 -> 335273 (0.01%) total local used in shared programs :31968 -> 31968 (0.00%) local gpr inst bytes helped 0 41 852 852 hurt 0 44 23 23 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nvc0: fix valid range for shader buffers	Samuel Pitoiset	2016-10-10	3	-0/+3
\| \| \| \| \| \| \| \|	When offset != 0, the valid range was wrong because the second argument of util_range_add() is end, not size. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nvc0/ir: fix overwriting of value backing non-constant gather offset	Ilia Mirkin	2016-10-10	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Normally the value is an immediate, which is moved to some temporary, so there's no problem. In the case of a non-constant offset (as allowed by ARB_gpu_shader5), we have to take care to copy it first before using it to build up the bits. This fixes a compilation error observed in F1 2015. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
*	nv50/ir: only stick one preret per function	Ilia Mirkin	2016-10-10	1	-4/+7
\| \| \| \| \| \| \| \| \| \| \|	A function with multiple returns would have had multiple preret settings at the top of the function. While this is unlikely to have caused issues since we don't use functions in earnest, it could have in some cases overflowed the call stack, in case a function had a lot of early returns. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
*	nv50/ir: fix wrong check when optimizing MAD to SHLADD	Samuel Pitoiset	2016-10-07	1	-1/+1
\| \| \| \| \| \| \| \| \|	Checking if MAD is supported is definitely wrong, and it's more likely a typo I introduced few days ago which breaks NV50 because SHLADD is not supported there. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nvc0: dump program binary only when NV50_PROG_DEBUG is set	Samuel Pitoiset	2016-10-07	1	-1/+1
\| \| \| \| \| \| \| \|	When the chipset is forced with NV50_PROG_CHIPSET, we actually only want to output the binary if NV50_PROG_DEBUG is also enabled. Otherwise, this pollutes the shader-db output. Signed-off-by: Samuel Pitoiset <[email protected]>
*	nvc0: expose ARB_compute_variable_group_size	Samuel Pitoiset	2016-10-07	1	-2/+6
\| \| \| \| \| \| \| \| \|	Only expose 512 threads/block on Fermi to not be limited by 32 GPRs/thread. v4: - use 512 threads on Fermi, 1024 on Kepler+ Signed-off-by: Samuel Pitoiset <[email protected]>
*	nv50/ir: set number of threads/block for variable local size	Samuel Pitoiset	2016-10-07	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	When a variable local size is defined as specified by ARB_compute_variable_group_size, the fixed local size is set to 0 and a SIGFPE occurs when we compute the maximum number of regs. This allows to use 64 GPRs/thread. v4: - use 512 threads on Fermi, 1024 on Kepler+ Signed-off-by: Samuel Pitoiset <[email protected]>
*	gallium: add PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCK	Samuel Pitoiset	2016-10-07	2	-0/+4
\| \| \| \| \| \| \| \| \|	v3: - use a new case statement in r600_pipe_common.c - fix compilation of softpipe... Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	nv50/ir: optimize sub(a, 0) to a	Karol Herbst	2016-10-06	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	helped some ue4 demos and divinity OS shaders total instructions in shared programs : 2818674 -> 2818606 (-0.00%) total gprs used in shared programs : 379273 -> 379273 (0.00%) total local used in shared programs : 9505 -> 9505 (0.00%) total bytes used in shared programs : 25837792 -> 25837192 (-0.00%) local gpr inst bytes helped 0 0 33 33 hurt 0 0 0 0 Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Pierre Moreau <[email protected]>
*	nvc0: dump program binary when chipset has been forced	Samuel Pitoiset	2016-10-05	1	-0/+5
\| \| \| \| \| \| \| \|	Currently, program binaries are only dumped at upload time, but when the chipset has been forced via NV50_PROG_CHIPSET we might want to show the generated code, especially with shaderdb. Signed-off-by: Samuel Pitoiset <[email protected]>
*	nv50/ra: let simplify return an error and handle that	Karol Herbst	2016-10-05	1	-5/+7
\| \| \| \| \| \| \| \|	fixes a crash in the case simplify reports an error Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
*	nv50/ir: teach insnCanLoad() about SHLADD	Samuel Pitoiset	2016-09-29	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commutativity is not allowed with SHLADD, but src2 can accept loads. To allow the load propagation pass to do its job, add a special case like for SUCLAMP because src1 is always an immediate. This IMAD to SHLADD optimization helps a bunch of shaders from Tomb Raider, Victor Vran, UE4 demos (+15% perf with Elemental) and Shadow Warrior. GF100/GK104: total instructions in shared programs :2838045 -> 2834712 (-0.12%) total gprs used in shared programs :396684 -> 396386 (-0.08%) total local used in shared programs :34416 -> 34416 (0.00%) local gpr inst bytes helped 0 326 1105 1105 hurt 0 55 3 3 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nv50/ir: optimize SHLADD(a, b, c) to MOV((a << b) + c)	Samuel Pitoiset	2016-09-29	1	-0/+3
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nv50/ir: optimize SHLADD(a, b, 0x0) to SHL(a, b)	Samuel Pitoiset	2016-09-29	1	-0/+8
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nv50/ir: optimize IMAD to SHLADD in presence of power of 2	Samuel Pitoiset	2016-09-29	1	-0/+7
\| \| \| \| \| \| \|	Only and only if src1 is a power of 2 we can replace IMAD by SHLADD. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nvc0/ir: add emission for SHLADD	Samuel Pitoiset	2016-09-29	3	-0/+127
\| \| \| \| \| \| \| \|	Unfortunately, we can't use the emit helpers for GF100/GK110 because src1 and src2 are swapped. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nv50/ir: add preliminary support for SHLADD	Samuel Pitoiset	2016-09-29	5	-7/+17
\| \| \| \| \| \| \| \| \| \|	This instruction is available since SM20 (Fermi) and allow to do (a << b) + c in one shot. In some situations, IMAD should be replaced by SHLADD when b is a power of 2, and ADD+SHL should be replaced by SHLADD as well. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nvc0: update GM107 sched control codes format	Samuel Pitoiset	2016-09-29	2	-23/+23
\| \| \| \| \| \| \| \| \|	envyas now uses a much better representation for those control codes and it displays the different flags instead of an unreadable hex number. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nv50/ir: fix comments about instructions info	Samuel Pitoiset	2016-09-26	1	-2/+3
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
*	nvc0: allow to force compiling programs in debug build	Samuel Pitoiset	2016-09-26	1	-9/+10
\| \| \| \| \| \| \| \| \|	This adds a new envvar called NV50_PROG_CHIPSET which allows to compile shaders with a different target, especially useful for shader-db. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nv50/ir: drop unused NVISA_XXX_CHIPSET constants	Samuel Pitoiset	2016-09-26	1	-2/+0
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nvc0: get rid of nvc0_stage_sampler_states_bind_range()	Samuel Pitoiset	2016-09-19	1	-74/+9
\| \| \| \| \| \| \|	Same thing as nvc0_stage_set_sampler_views_range(). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nvc0: get rid of nvc0_stage_set_sampler_views_range()	Samuel Pitoiset	2016-09-19	1	-89/+15
\| \| \| \| \| \| \| \|	This function was quite similar to nvc0_stage_set_sampler_views() and I don't see any reasons to not remove it. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nv50/ir: optimize SUB(a, b) to MOV(a - b)	Samuel Pitoiset	2016-09-18	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This helps shaders in UE4 demos, especially with Elemental (+1% perf). This optimization reduces spilling usage in one shader which explains the little gain. GF100/GK104: total instructions in shared programs :2838551 -> 2838045 (-0.02%) total gprs used in shared programs :396706 -> 396684 (-0.01%) total local used in shared programs :34432 -> 34416 (-0.05%) local gpr inst bytes helped 1 19 112 112 hurt 0 0 0 0 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	gk110/ir: fix wrong emission of OP_NOT	Samuel Pitoiset	2016-09-18	1	-1/+1
\| \| \| \| \| \| \| \| \|	This should emit src0 instead of src1. Found by inspection. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: [email protected]
*	nvc0/ir: fix subops for IMAD	Samuel Pitoiset	2016-09-17	1	-4/+6
\| \| \| \| \| \| \| \| \|	Offset was wrong, it's at bit 8, not 4. Also, uses subr instead of sub when src2 has neg. Similar to GK110 now. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: [email protected]
*	nvc0/ir: fix comments about instructions info	Samuel Pitoiset	2016-09-17	1	-2/+3
\| \| \| \| \| \| \| \| \|	The comment for the commutative flags was wrong because OP_MUL is before OP_MAD. While we are at it add missing opcodes, and fix the comment about the short forms. Signed-off-by: Samuel Pitoiset <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
*	gm107/ir: allow indirect inputs to be loaded by frag shader	Ilia Mirkin	2016-09-10	2	-5/+21
\| \| \| \| \| \| \| \| \|	Looks like the GM107 IPA op does not allow a separate offset when using an indirect register. Instead we must use AL2P like we do for indirect vertex operations on Kepler+. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
*	gm107/ir: AL2P writes to a predicate register	Ilia Mirkin	2016-09-10	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	We have to force it to write to predicate 7 (aka PT) in order for it not to mess up another predicate. Unclear what would be returned in the predicate, perhaps an error code for out-of-bounds requests. Blob doesn't seem to check it. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
*	gallium: remove PIPE_BIND_TRANSFER_READ/WRITE	Marek Olšák	2016-09-08	3	-12/+6
\| \| \| \| \| \| \| \|	not used in any useful way Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
*	gk110/ir: fix quadop dall emission	Ilia Mirkin	2016-09-04	1	-2/+2
\| \| \| \| \| \| \| \| \|	We recently starting to always emit the NDV (== dall) bit for quadops. However it was folded into the wrong code word. Fixes: e0a067ed48 (nv50/ir: always emit the NDV bit for OP_QUADOP) Signed-off-by: Ilia Mirkin <[email protected]> Cc: <[email protected]>
*	nvc0/ir: allow min/max instructions to be dual-issued in pairs	Karol Herbst	2016-09-03	1	-2/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	changes for GpuTest /test=pixmark_piano /benchmark /no_scorebox /msaa=0 /benchmark_duration_ms=60000 /width=1024 /height=640: inst_executed: 1.03G inst_issued1: 614M -> 580M inst_issued2: 213M -> 230M score: 1021 -> 1030 Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nv50,nvc0: respect render condition enable flag when clearing rt/zs	Ilia Mirkin	2016-09-03	2	-12/+24
\| \| \| \| \| \| \| \|	This is a newly added flag. We always pass false into it from nv50_clear_texture, but other callers may want to respect the render condition. (And the functions were originally spec'd to respect it.) Signed-off-by: Ilia Mirkin <[email protected]>
*	nvc0/ir: don't dual-issue ops that depend or interfere with each other	Karol Herbst	2016-09-03	3	-14/+23
\| \| \| \| \| \| \|	Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Tobias Klausmann <[email protected]> [imirkin: rewrite to split up the helpers and move more logic to target] Signed-off-by: Ilia Mirkin <[email protected]>
*	nvc0: reduce the initial code segment size to 512KB	Samuel Pitoiset	2016-09-01	1	-1/+1
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nvc0: allow to resize the code segment dynamically	Samuel Pitoiset	2016-09-01	1	-1/+24
\| \| \| \| \| \| \| \| \| \| \| \| \|	When an application uses a ton of shaders, we need to evict them when the code segment is full but this is not really a good solution if monster shaders are used because code eviction will happen a lot. To avoid this, it seems better to dynamically resize the code segment area after each eviction. The maximum size is arbitrary fixed to 8MB which should be enough. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nvc0: add a new bin for the code segment	Samuel Pitoiset	2016-09-01	2	-4/+6
\| \| \| \| \| \| \| \| \|	To avoid the bins list to grow up indefinitely when the code segment size will be bumped, we need to separate that bin from the SCREEN one because it contains other resources like the uniform bo. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nvc0: add nvc0_screen_resize_text_area() helper	Samuel Pitoiset	2016-09-01	3	-10/+40
\| \| \| \| \| \| \| \|	This function will be helpful for resizing the code segment area when we need to evict all shaders. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nvc0: re-upload currently bound shaders after code eviction	Samuel Pitoiset	2016-09-01	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes a very old issue which happens when the code segment size is full. A bunch of real applications like Tomb Raider, F1 2015, Elemental, hit that issue because they use a ton of shaders. In this case, all shaders are evicted (for freeing space) but all currently bound shaders also need to be re-uploaded and SP_START_ID have to be updated accordingly. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nvc0: refactor the program upload process	Samuel Pitoiset	2016-09-01	3	-32/+59
\| \| \| \| \| \| \| \| \| \|	This refactoring will help for fixing the "out of code space" eviction issue because we will need to reupload the code for all currently bound shaders but it's slightly different than uploading a new fresh code. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nvc0: remove an attempt at uploading all IMMD into a CB	Samuel Pitoiset	2016-08-31	3	-40/+0
\| \| \| \| \| \| \| \| \| \| \|	This has never been used because info->immd.bufSize is always 0 and anyways this is an experimental code which has never been completed. This gets rid of some unused code in the program validation process. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nv50: remove unused nv50_program::immd_size field	Samuel Pitoiset	2016-08-31	1	-1/+0
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nv30: set usage to staging so that the buffer is allocated in GART	Ilia Mirkin	2016-08-31	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	The code a few lines below expects to migrate the bo in question to VRAM. Since we're filling the initial data via CPU, it's more efficient to create the temporary buffer in GART. There is no "push" method implemented, otherwise we'd use that instead. Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
*	nv30: only bail on color/depth bpp mismatch when surfaces are swizzled	Ilia Mirkin	2016-08-31	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \|	The actual restriction is a little weaker than I originally thought. See https://bugs.freedesktop.org/show_bug.cgi?id=92306#c17 for the suggestion. This also explain why things weren't always failing before, only sometimes. We will allocate a non-swizzled depth buffer for NPOT winsys buffer sizes, which they almost always are. Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
*	nvc0: fix indentation in nvc0_screen_init()	Samuel Pitoiset	2016-08-30	1	-1/+1
\| \| \| \| \| \| \|	Trivial. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nvc0: check return value of nvc0_screen_resize_tls_area()	Samuel Pitoiset	2016-08-30	2	-11/+8
\| \| \| \| \| \| \| \|	While we are at it, make it static and change the return values policy to be consistent. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nvc0: make use of FAIL_SCREEN_INIT in nvc0_screen_create()	Samuel Pitoiset	2016-08-30	1	-9/+7
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	nv50/ir: always emit the NDV bit for OP_QUADOP	Samuel Pitoiset	2016-08-30	2	-8/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This silences a divergent error found with F1 2015. Basically, the NDV bit has to be set when a FSWZ instruction is inside divergent code, but it's not needed otherwise. The correct fix should be to set it only in divergent code situations. GM107 emitter already sets that bit. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: <[email protected]>
*	nvc0: undo overzealous enum usage	Ilia Mirkin	2016-08-30	1	-2/+2
\| \| \| \| \| \| \| \|	Commit 7413625ad3 flipped a few functions too many to use pipe_shader_type. These functions actually take an integer that does not correspond 1:1 with the enum. Signed-off-by: Ilia Mirkin <[email protected]>