summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/nouveau
Commit message (Collapse)AuthorAgeFilesLines
* nv50/ir: optimize ADD(SHL(a, b), c) to SHLADD(a, b, c)Samuel Pitoiset2016-10-121-0/+87
| | | | | | | | | | | | | total instructions in shared programs :2286901 -> 2284473 (-0.11%) total gprs used in shared programs :335256 -> 335273 (0.01%) total local used in shared programs :31968 -> 31968 (0.00%) local gpr inst bytes helped 0 41 852 852 hurt 0 44 23 23 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: fix valid range for shader buffersSamuel Pitoiset2016-10-103-0/+3
| | | | | | | | When offset != 0, the valid range was wrong because the second argument of util_range_add() is end, not size. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0/ir: fix overwriting of value backing non-constant gather offsetIlia Mirkin2016-10-101-2/+2
| | | | | | | | | | | | | Normally the value is an immediate, which is moved to some temporary, so there's no problem. In the case of a non-constant offset (as allowed by ARB_gpu_shader5), we have to take care to copy it first before using it to build up the bits. This fixes a compilation error observed in F1 2015. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
* nv50/ir: only stick one preret per functionIlia Mirkin2016-10-101-4/+7
| | | | | | | | | | | A function with multiple returns would have had multiple preret settings at the top of the function. While this is unlikely to have caused issues since we don't use functions in earnest, it could have in some cases overflowed the call stack, in case a function had a lot of early returns. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nv50/ir: fix wrong check when optimizing MAD to SHLADDSamuel Pitoiset2016-10-071-1/+1
| | | | | | | | | Checking if MAD is supported is definitely wrong, and it's more likely a typo I introduced few days ago which breaks NV50 because SHLADD is not supported there. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: dump program binary only when NV50_PROG_DEBUG is setSamuel Pitoiset2016-10-071-1/+1
| | | | | | | | When the chipset is forced with NV50_PROG_CHIPSET, we actually only want to output the binary if NV50_PROG_DEBUG is also enabled. Otherwise, this pollutes the shader-db output. Signed-off-by: Samuel Pitoiset <[email protected]>
* nvc0: expose ARB_compute_variable_group_sizeSamuel Pitoiset2016-10-071-2/+6
| | | | | | | | | Only expose 512 threads/block on Fermi to not be limited by 32 GPRs/thread. v4: - use 512 threads on Fermi, 1024 on Kepler+ Signed-off-by: Samuel Pitoiset <[email protected]>
* nv50/ir: set number of threads/block for variable local sizeSamuel Pitoiset2016-10-071-0/+2
| | | | | | | | | | | | When a variable local size is defined as specified by ARB_compute_variable_group_size, the fixed local size is set to 0 and a SIGFPE occurs when we compute the maximum number of regs. This allows to use 64 GPRs/thread. v4: - use 512 threads on Fermi, 1024 on Kepler+ Signed-off-by: Samuel Pitoiset <[email protected]>
* gallium: add PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCKSamuel Pitoiset2016-10-072-0/+4
| | | | | | | | | v3: - use a new case statement in r600_pipe_common.c - fix compilation of softpipe... Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* nv50/ir: optimize sub(a, 0) to aKarol Herbst2016-10-061-0/+3
| | | | | | | | | | | | | | | | | helped some ue4 demos and divinity OS shaders total instructions in shared programs : 2818674 -> 2818606 (-0.00%) total gprs used in shared programs : 379273 -> 379273 (0.00%) total local used in shared programs : 9505 -> 9505 (0.00%) total bytes used in shared programs : 25837792 -> 25837192 (-0.00%) local gpr inst bytes helped 0 0 33 33 hurt 0 0 0 0 Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Pierre Moreau <[email protected]>
* nvc0: dump program binary when chipset has been forcedSamuel Pitoiset2016-10-051-0/+5
| | | | | | | | Currently, program binaries are only dumped at upload time, but when the chipset has been forced via NV50_PROG_CHIPSET we might want to show the generated code, especially with shaderdb. Signed-off-by: Samuel Pitoiset <[email protected]>
* nv50/ra: let simplify return an error and handle thatKarol Herbst2016-10-051-5/+7
| | | | | | | | fixes a crash in the case simplify reports an error Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nv50/ir: teach insnCanLoad() about SHLADDSamuel Pitoiset2016-09-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | | Commutativity is not allowed with SHLADD, but src2 can accept loads. To allow the load propagation pass to do its job, add a special case like for SUCLAMP because src1 is always an immediate. This IMAD to SHLADD optimization helps a bunch of shaders from Tomb Raider, Victor Vran, UE4 demos (+15% perf with Elemental) and Shadow Warrior. GF100/GK104: total instructions in shared programs :2838045 -> 2834712 (-0.12%) total gprs used in shared programs :396684 -> 396386 (-0.08%) total local used in shared programs :34416 -> 34416 (0.00%) local gpr inst bytes helped 0 326 1105 1105 hurt 0 55 3 3 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: optimize SHLADD(a, b, c) to MOV((a << b) + c)Samuel Pitoiset2016-09-291-0/+3
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: optimize SHLADD(a, b, 0x0) to SHL(a, b)Samuel Pitoiset2016-09-291-0/+8
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: optimize IMAD to SHLADD in presence of power of 2Samuel Pitoiset2016-09-291-0/+7
| | | | | | | Only and only if src1 is a power of 2 we can replace IMAD by SHLADD. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0/ir: add emission for SHLADDSamuel Pitoiset2016-09-293-0/+127
| | | | | | | | Unfortunately, we can't use the emit helpers for GF100/GK110 because src1 and src2 are swapped. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: add preliminary support for SHLADDSamuel Pitoiset2016-09-295-7/+17
| | | | | | | | | | This instruction is available since SM20 (Fermi) and allow to do (a << b) + c in one shot. In some situations, IMAD should be replaced by SHLADD when b is a power of 2, and ADD+SHL should be replaced by SHLADD as well. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: update GM107 sched control codes formatSamuel Pitoiset2016-09-292-23/+23
| | | | | | | | | envyas now uses a much better representation for those control codes and it displays the different flags instead of an unreadable hex number. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: fix comments about instructions infoSamuel Pitoiset2016-09-261-2/+3
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
* nvc0: allow to force compiling programs in debug buildSamuel Pitoiset2016-09-261-9/+10
| | | | | | | | | This adds a new envvar called NV50_PROG_CHIPSET which allows to compile shaders with a different target, especially useful for shader-db. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: drop unused NVISA_XXX_CHIPSET constantsSamuel Pitoiset2016-09-261-2/+0
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: get rid of nvc0_stage_sampler_states_bind_range()Samuel Pitoiset2016-09-191-74/+9
| | | | | | | Same thing as nvc0_stage_set_sampler_views_range(). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: get rid of nvc0_stage_set_sampler_views_range()Samuel Pitoiset2016-09-191-89/+15
| | | | | | | | This function was quite similar to nvc0_stage_set_sampler_views() and I don't see any reasons to not remove it. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: optimize SUB(a, b) to MOV(a - b)Samuel Pitoiset2016-09-181-0/+10
| | | | | | | | | | | | | | | | | | | This helps shaders in UE4 demos, especially with Elemental (+1% perf). This optimization reduces spilling usage in one shader which explains the little gain. GF100/GK104: total instructions in shared programs :2838551 -> 2838045 (-0.02%) total gprs used in shared programs :396706 -> 396684 (-0.01%) total local used in shared programs :34432 -> 34416 (-0.05%) local gpr inst bytes helped 1 19 112 112 hurt 0 0 0 0 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gk110/ir: fix wrong emission of OP_NOTSamuel Pitoiset2016-09-181-1/+1
| | | | | | | | | This should emit src0 instead of src1. Found by inspection. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* nvc0/ir: fix subops for IMADSamuel Pitoiset2016-09-171-4/+6
| | | | | | | | | Offset was wrong, it's at bit 8, not 4. Also, uses subr instead of sub when src2 has neg. Similar to GK110 now. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* nvc0/ir: fix comments about instructions infoSamuel Pitoiset2016-09-171-2/+3
| | | | | | | | | The comment for the commutative flags was wrong because OP_MUL is before OP_MAD. While we are at it add missing opcodes, and fix the comment about the short forms. Signed-off-by: Samuel Pitoiset <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
* gm107/ir: allow indirect inputs to be loaded by frag shaderIlia Mirkin2016-09-102-5/+21
| | | | | | | | | Looks like the GM107 IPA op does not allow a separate offset when using an indirect register. Instead we must use AL2P like we do for indirect vertex operations on Kepler+. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* gm107/ir: AL2P writes to a predicate registerIlia Mirkin2016-09-101-0/+1
| | | | | | | | | | | We have to force it to write to predicate 7 (aka PT) in order for it not to mess up another predicate. Unclear what would be returned in the predicate, perhaps an error code for out-of-bounds requests. Blob doesn't seem to check it. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
* gallium: remove PIPE_BIND_TRANSFER_READ/WRITEMarek Olšák2016-09-083-12/+6
| | | | | | | | not used in any useful way Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gk110/ir: fix quadop dall emissionIlia Mirkin2016-09-041-2/+2
| | | | | | | | | We recently starting to always emit the NDV (== dall) bit for quadops. However it was folded into the wrong code word. Fixes: e0a067ed48 (nv50/ir: always emit the NDV bit for OP_QUADOP) Signed-off-by: Ilia Mirkin <[email protected]> Cc: <[email protected]>
* nvc0/ir: allow min/max instructions to be dual-issued in pairsKarol Herbst2016-09-031-2/+12
| | | | | | | | | | | | | | changes for GpuTest /test=pixmark_piano /benchmark /no_scorebox /msaa=0 /benchmark_duration_ms=60000 /width=1024 /height=640: inst_executed: 1.03G inst_issued1: 614M -> 580M inst_issued2: 213M -> 230M score: 1021 -> 1030 Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50,nvc0: respect render condition enable flag when clearing rt/zsIlia Mirkin2016-09-032-12/+24
| | | | | | | | This is a newly added flag. We always pass false into it from nv50_clear_texture, but other callers may want to respect the render condition. (And the functions were originally spec'd to respect it.) Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: don't dual-issue ops that depend or interfere with each otherKarol Herbst2016-09-033-14/+23
| | | | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Tobias Klausmann <[email protected]> [imirkin: rewrite to split up the helpers and move more logic to target] Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: reduce the initial code segment size to 512KBSamuel Pitoiset2016-09-011-1/+1
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: allow to resize the code segment dynamicallySamuel Pitoiset2016-09-011-1/+24
| | | | | | | | | | | | | When an application uses a ton of shaders, we need to evict them when the code segment is full but this is not really a good solution if monster shaders are used because code eviction will happen a lot. To avoid this, it seems better to dynamically resize the code segment area after each eviction. The maximum size is arbitrary fixed to 8MB which should be enough. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: add a new bin for the code segmentSamuel Pitoiset2016-09-012-4/+6
| | | | | | | | | To avoid the bins list to grow up indefinitely when the code segment size will be bumped, we need to separate that bin from the SCREEN one because it contains other resources like the uniform bo. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: add nvc0_screen_resize_text_area() helperSamuel Pitoiset2016-09-013-10/+40
| | | | | | | | This function will be helpful for resizing the code segment area when we need to evict all shaders. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: re-upload currently bound shaders after code evictionSamuel Pitoiset2016-09-011-0/+27
| | | | | | | | | | | | | This fixes a very old issue which happens when the code segment size is full. A bunch of real applications like Tomb Raider, F1 2015, Elemental, hit that issue because they use a ton of shaders. In this case, all shaders are evicted (for freeing space) but all currently bound shaders also need to be re-uploaded and SP_START_ID have to be updated accordingly. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: refactor the program upload processSamuel Pitoiset2016-09-013-32/+59
| | | | | | | | | | This refactoring will help for fixing the "out of code space" eviction issue because we will need to reupload the code for all currently bound shaders but it's slightly different than uploading a new fresh code. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: remove an attempt at uploading all IMMD into a CBSamuel Pitoiset2016-08-313-40/+0
| | | | | | | | | | | This has never been used because info->immd.bufSize is always 0 and anyways this is an experimental code which has never been completed. This gets rid of some unused code in the program validation process. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50: remove unused nv50_program::immd_size fieldSamuel Pitoiset2016-08-311-1/+0
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv30: set usage to staging so that the buffer is allocated in GARTIlia Mirkin2016-08-311-1/+2
| | | | | | | | | | The code a few lines below expects to migrate the bo in question to VRAM. Since we're filling the initial data via CPU, it's more efficient to create the temporary buffer in GART. There is no "push" method implemented, otherwise we'd use that instead. Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* nv30: only bail on color/depth bpp mismatch when surfaces are swizzledIlia Mirkin2016-08-311-2/+3
| | | | | | | | | | | The actual restriction is a little weaker than I originally thought. See https://bugs.freedesktop.org/show_bug.cgi?id=92306#c17 for the suggestion. This also explain why things weren't *always* failing before, only sometimes. We will allocate a non-swizzled depth buffer for NPOT winsys buffer sizes, which they almost always are. Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* nvc0: fix indentation in nvc0_screen_init()Samuel Pitoiset2016-08-301-1/+1
| | | | | | | Trivial. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: check return value of nvc0_screen_resize_tls_area()Samuel Pitoiset2016-08-302-11/+8
| | | | | | | | While we are at it, make it static and change the return values policy to be consistent. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: make use of FAIL_SCREEN_INIT in nvc0_screen_create()Samuel Pitoiset2016-08-301-9/+7
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: always emit the NDV bit for OP_QUADOPSamuel Pitoiset2016-08-302-8/+2
| | | | | | | | | | | | | | This silences a divergent error found with F1 2015. Basically, the NDV bit has to be set when a FSWZ instruction is inside divergent code, but it's not needed otherwise. The correct fix should be to set it only in divergent code situations. GM107 emitter already sets that bit. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: <[email protected]>
* nvc0: undo overzealous enum usageIlia Mirkin2016-08-301-2/+2
| | | | | | | | Commit 7413625ad3 flipped a few functions too many to use pipe_shader_type. These functions actually take an integer that does not correspond 1:1 with the enum. Signed-off-by: Ilia Mirkin <[email protected]>