aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/r600/sb
Commit message (Collapse)AuthorAgeFilesLines
* r600: add time lo/hi debugging output.Dave Airlie2018-02-261-0/+6
| | | | This just adds the these to the debug prints.
* r600/sb: Check whether optimizations would result in reladdr conflictGert Wollny2018-02-093-4/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | v2: * Check whether the node src and dst registers are NULL before using them. * fix a type in the commit message. Two cases are handled with this patch: 1. If copy propagation tries to eliminated a move from a relative array access then it could optimize MOV R1, ARRAY[RELADDR_1] MOV R2, ARRAY[RELADDR_2] OP2 R3, R1 R2 into OP2 R3, ARRAY[RELADDR_1], ARRAY[RELADDR_2] which is forbidden, because there is only one address register available. 2. When MULADD(x,a,MUL(x,c)) is handled MUL TMP, R1, ARRAY[RELADDR_1] MULLADD R3, R1, ARRAY[RELADDR_2], TMP by folding this into ADD TMP, ARRAY[RELADDR_2], ARRAY[RELADDR_1] MUL R3, R1, TMP which is also forbidden. Test for these cases and reject the optimization if a forbidden combination of relative access would be created. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103142 Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600/sb: handle scratch mem reads on r600Dave Airlie2018-02-092-5/+23
| | | | | | | | | | On r600 we use the scratch mem with read/read_ind, in that case sb should track the rw_gpr as a dst instead of a src. This stops the whole shader being optimised out. Signed-off-by: Dave Airlie <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600g/sb: Add dependency tracking for scratch opsGlenn Kennard2018-02-097-4/+21
| | | | | Signed-off-by: Glenn Kennard <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600g/sb: Support scratch opsGlenn Kennard2018-02-095-1/+153
| | | | | Signed-off-by: Glenn Kennard <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600/sb/cayman: fix indirect ubo access on caymanDave Airlie2018-02-071-1/+1
| | | | | | | | | | | | | | With sb enabled on cayman, this was overwriting the proper cf index value with random ones if the dst gpr was 2 or 3, only save the value for a MOVA instruction. Fixes: KHR-GL45.gpu_shader5.uniform_blocks_array_indexing (on cayman with sb) Cc: <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: don't do stack workarounds for hemlockRoland Scheidegger2018-02-021-0/+1
| | | | | | | | | By the looks of it it seems hemlock is treated separately to cypress, but certainly it won't need the stack workarounds cedar/redwood (and seemingly every other eg chip except cypress/juniper) need. (Discovered by accident.) Acked-by: Alex Deucher <[email protected]>
* r600/sb: just add some missing debug bitsDave Airlie2018-02-011-0/+15
| | | | Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: insert the else clause when we might depart from a loopDave Airlie2018-01-311-0/+17
| | | | | | | | | | | | | | If there is a break inside the else clause and this means we are breaking from a loop, the loop finalise will want to insert the LOOP_BREAK/CONTINUE instruction, however if we don't emit the else there is no where for these to end up, so they will end up in the wrong place. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101442 Tested-By: Gert Wollny <[email protected]> Cc: <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: add lds related peepholes.Dave Airlie2018-01-181-1/+8
| | | | | | | | | if no destination: a) convert _RET instructions to non _RET variants if no dst b) set src0 to undefined if it's a READ, this should get DCE then. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: use different stacks for tracking lds and queue usage.Dave Airlie2018-01-182-3/+24
| | | | | | | | | | | | | | The normal ssa renumbering isn't sufficient for LDS queue access, this uses two stacks, one for the lds queue, and one for the lds r/w ordering. The LDS oq values are incremented in their use in a linear fashion. The LDS rw values are incremented in their definitions and used in the next lds operation to ensure reordering doesn't occur. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: schedule LDS ops in appropriate places.Dave Airlie2018-01-182-0/+7
| | | | | | | | | | So LDS ops have to be SLOT_X, and LDS OQ reads have read port restrictions so we try and force those into only having one per slot and avoiding bank swizzles. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: hit the scheduler with a big hammer to avoid lds splits.Dave Airlie2018-01-181-0/+3
| | | | | | | | | | | | | This tries to avoid an lds queue read getting scheduled separately from an lds ret read, the non-sb code uses the same style of hammer, this isn't foolproof. We can do better, but it's a bit tricky, as you have to scan ahead and either schedule more lds oq moves and more lds reads and that could lead to you running out of space anyways. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: adding lds oq tracking to the schedulerDave Airlie2018-01-182-3/+15
| | | | | | | | | | | This adds support for tracking the lds oq read/writes so can avoid scheduling other things in between. This patch just adds the tracking and assert to show problems. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: add gcm support to avoid clause between lds read/queue readDave Airlie2018-01-182-2/+17
| | | | | | | | | | You have to schedule LDS_READ_RET _, x and MOV reg, LDS_OQ_A_POP in the same basic block/clause. This makes sure once we've issues and MOV we don't add another block until we balance it with an LDS read. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: handle lds special dest registers.Dave Airlie2018-01-182-2/+2
| | | | | | | This adds lds to the geom emit handling Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: handle LDS operations in folding.Dave Airlie2018-01-181-0/+11
| | | | | | | Don't try and fold LDS using expressions. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: add finalising for lds output queue special values.Dave Airlie2018-01-181-0/+12
| | | | | | | We need to convert these to the hw special registers. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: add initial support for parsing lds operations.Dave Airlie2018-01-181-2/+50
| | | | | | | This handles parsing the LDS ops and queue accessess. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: disable if conversion for hsDave Airlie2018-01-181-1/+1
| | | | | | | This fixes bad interactions with the LDS special values. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: lds ops have no dst register.Dave Airlie2018-01-181-1/+1
| | | | | | | Although these are op3s they don't have a dst reg. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: introduce special register values for lds support.Dave Airlie2018-01-183-1/+33
| | | | | | | | | | | | | For LDS read/write ordering we use the LDS_RW value, reads will wait on previous writes. For LDS read/read from LDS queue ordering we use the LDS_OQ values, we define two for now, though initially we'll just support OQA. Also add the check for the lds oq values Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: update last_cf if alu is the last clauseDave Airlie2018-01-181-0/+1
| | | | | | | | | | | It's rare to have a final alu clause on normal shaders (exports) but tess shaders write to LDS as their output, so we see some alu clauses, and the CF_END get put in the wrong place. This makes sure to update last_cf correctly. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: start adding GDS supportDave Airlie2018-01-1812-12/+122
| | | | | | | | | This adds support for GDS ops to sb backend. This seems to work for atomics and tess factor writes. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: add tess/compute initial state registers.Dave Airlie2018-01-181-1/+4
| | | | | | | This stops them being optimised out. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: fix a bug emitting ar load from a constant.Dave Airlie2018-01-181-0/+3
| | | | | | | | | | | | Some tess shaders were doing MOVA_INT _, c0.x on cayman, and then hitting an assert in sb_bc_finalize.cpp:translate_kcache. This makes sure the toplevel kcache tracker gets updated, and the clause gets fixed up. Reviewed-by: Roland Scheidegger <[email protected]> Cc: <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: do not convert if-blocks that contain indirect array accessGert Wollny2017-12-073-2/+5
| | | | | | | | | | | | | | | | | | | | If an array is accessed within an if block, then currently it is not known whether the value in the address register is involved in the evaluation of the if condition, and converting the if condition may actually result in out-of-bounds array access. Consequently, if blocks that contain indirect array access should not be converted. Fixes piglits on r600/BARTS: spec/glsl-1.10/execution/variable-indexing/ vs-output-array-float-index-wr vs-output-array-vec3-index-wr vs-output-array-vec4-index-wr Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104143 Signed-off-by: Gert Wollny <[email protected]> Cc: <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: handle jump after target to end of program. (v2)Dave Airlie2017-11-291-0/+5
| | | | | | | | | | | | | | | | This fixes hangs on cayman with tests/spec/arb_tessellation_shader/execution/trivial-tess-gs_no-gs-inputs.shader_test This has a single if/else in it, and when this peephole activated, it would set the jump target to NULL if there was no instruction after the final POP. This adds a NOP if we get a jump in this case, and seems to fix the hangs, so we have a valid target for the ELSE instruction to go to, instead of 0 (which causes infinite loops). v2: update last_cf correctly. (I had some other patches hide this) Cc: <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: use min_dx10/max_dx10 instead of min/maxRoland Scheidegger2017-11-151-0/+2
| | | | | | | | | | | | | | I believe this is the safe thing to do, especially ever since the driver actually generates NaNs for muls too. The ISA docs are not very helpful here, however the dx10 versions will pick a non-nan result over a NaN one (this is also the ieee754 behavior), whereas the non-dx10 ones will pick the NaN (verified by newly changed piglit isinf-and-isnan test). Other "modern" drivers will most likely do the same. This was shown to make some difference for bug 103544, albeit it is not required to fix it. Reviewed-by: Dave Airlie <[email protected]>
* util: move os_time.[ch] to src/utilNicolai Hähnle2017-11-091-1/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* r600/sb: bail out if prepare_alu_group() doesn't find a proper schedulingGert Wollny2017-11-012-20/+31
| | | | | | | | | | | | | | | It is possible that the optimizer ends up in an infinite loop in post_scheduler::schedule_alu(), because post_scheduler::prepare_alu_group() does not find a proper scheduling. This can be deducted from pending.count() being larger than zero and not getting smaller. This patch works around this problem by signalling this failure so that the optimizers bails out and the un-optimized shader is used. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103142 Cc: <[email protected]> Signed-off-by: Gert Wollny <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* tree-wide: remove trailing backslashEric Engestrom2017-06-071-1/+1
| | | | | | | | | Simple search for a backslash followed by two newlines. If one of the newlines were to be removed, this would cause issues, so let's just remove these trailing backslashes. Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* r600/sb: fix typo in field definitionsDave Airlie2017-06-061-1/+1
| | | | Pointed out by glennk.
* r600g/sb: Fix memory leak by reworking uses list (rebased)Constantine Kharlamov2017-03-204-61/+28
| | | | | | | | | | | | | | | | | | | | | | | | | The author is Heiko Przybyl(CC'ing), the patch is rebased on top of Bartosz Tomczyk's one per Dieter Nützel's comment. Tested-by: Constantine Charlamov <[email protected]> v2: Resend the patch again through git-email. The prev. rebase was sent through Thunderbird, which screwed up tab characters, making the patch not apply. -------------- When fixing the stalls on evergreen I introduced leaking of the useinfo structure(s). Sorry. Instead of allocating a new object to hold 3 values where only one is actually used, rework the list to just store the node pointer. Thus no allocating and deallocation is needed. Since use_info and use_kind aren't used anywhere, drop them and reduce code complexity. This might also save some small amount of cycles. Thanks to Bartosz Tomczyk for finding the bug. Reported-by: Bartosz Tomczyk <bartosz.tomczyk86 at gmail.com <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>> Signed-off-by: Heiko Przybyl <lil_tux at web.de <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>> Supersedes: https://patchwork.freedesktop.org/patch/135852 Signed-off-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* r600g: update sb documentationConstantine Kharlamov2017-03-201-3/+6
| | | | | | | v2: s/r600/r600g in the title Signed-off-by: Constantine Kharlamov <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* r600/sb: Fix memory leakBartosz Tomczyk2017-02-081-1/+7
| | | | Signed-off-by: Marek Olšák <[email protected]>
* r600g: use ieee variants of multiplication instructionsIlia Mirkin2017-01-291-0/+1
| | | | | | | | This matches the behavior of most other drivers, including nouveau, radeonsi, and i965. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* r600/sb: Fix loop optimization related hangs on egHeiko Przybyl2017-01-036-30/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure unused ops and their references are removed, prior to entering the GCM (global code motion) pass, to stop GCM from breaking the loop logic and thus hanging the GPU. Turns out, that sb has problems with loops and node optimizations regarding associative folding: - the global code motion (gcm) pass moves ops up a loop level/basic block until they've fulfilled their total usage count - if there are ops folded into others, the usage count won't be fulfilled and thus the op moved way up to the top - within GCM the op would be visited and their deps would be moved alongside it, to fulfill the src constaints - in a loop, an unused op is moved out of the loop and GCM would move the src value ops up as well - now here arises the problem: if the loop counter is one of the src values it would get moved up as well, the loop break condition would never get hit and the shader turn into an endless loop, resulting in the GPU hanging and being reset A reduced (albeit nonsense) piglit example would be: [require] GLSL >= 1.20 [fragment shader] uniform int SIZE; uniform vec4 lights[512]; void main() { float x = 0; for(int i = 0; i < SIZE; i++) x += lights[2*i+1].x; } [test] uniform int SIZE 1 draw rect -1 -1 2 2 Which gets optimized to: ===== SHADER #12 OPT ================================== PS/BARTS/EVERGREEN ===== ===== 42 dw ===== 1 gprs ===== 2 stack ========================================= ALU 3 @24 1 y: MOV R0.y, 0 t: MULLO_UINT R0.w, [0x00000002 2.8026e-45].x, R0.z LOOP_START_DX10 @22 PUSH @6 ALU 1 @30 KC0[CB0:0-15] 2 M x: PRED_SETGE_INT __.x, R0.z, KC0[0].x JUMP @14 POP:1 LOOP_BREAK @20 POP @14 POP:1 ALU 2 @32 3 x: ADD_INT R0.x, R0.w, [0x00000002 2.8026e-45].x TEX 1 @36 VFETCH R0.x___, R0.x, RID:0 MFC:16 UCF:0 FMT[..] ALU 1 @40 4 y: ADD R0.y, R0.y, R0.x LOOP_END @4 EXPORT_DONE PIXEL 0 R0.____ EOP ===== SHADER_END =============================================================== Notice R0.z being the loop counter/break condition relevant register and being never incremented at all. Also some of the loop content has been moved out of it, to fulfill the requirements for the one unused op. With a debug build of mesa this would produce an error like error at : PRED_SETGE_INT __, __, EM.2, R1.x.2||[email protected], C0.x : operand value R1.x.2||[email protected] was not previously written to its gpr and the compilation would fail due to this. On a release build it gets passed to the GPU. When using this patch, the loop remains intact: ===== SHADER #12 OPT ================================== PS/BARTS/EVERGREEN ===== ===== 48 dw ===== 1 gprs ===== 2 stack ========================================= ALU 2 @24 1 y: MOV R0.y, 0 z: MOV R0.z, 0 LOOP_START_DX10 @22 PUSH @6 ALU 1 @28 KC0[CB0:0-15] 2 M x: PRED_SETGE_INT __.x, R0.z, KC0[0].x JUMP @14 POP:1 LOOP_BREAK @20 POP @14 POP:1 ALU 4 @30 3 t: MULLO_UINT T0.x, [0x00000002 2.8026e-45].x, R0.z 4 x: ADD_INT R0.x, T0.x, [0x00000002 2.8026e-45].x TEX 1 @40 VFETCH R0.x___, R0.x, RID:0 MFC:16 UCF:0 FMT[..] ALU 2 @44 5 y: ADD R0.y, R0.y, R0.x z: ADD_INT R0.z, R0.z, 1 LOOP_END @4 EXPORT_DONE PIXEL 0 R0.____ EOP ===== SHADER_END =============================================================== Piglit: ./piglit summary console -d results/*_gpu_noglx name: unpatched_gpu_noglx patched_gpu_noglx ---- ------------------- ----------------- pass: 18016 18021 fail: 748 743 crash: 7 7 skip: 1124 1124 timeout: 0 0 warn: 13 13 incomplete: 0 0 dmesg-warn: 0 0 dmesg-fail: 0 0 changes: 0 5 fixes: 0 5 regressions: 0 0 total: 19908 19908 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94900 Tested-by: Heiko Przybyl <[email protected]> Tested-on: Barts PRO HD6850 Signed-off-by: Heiko Przybyl <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* r600g/sb: fix struct/class declaration conflictsMartina Kollarova2016-09-181-5/+1
| | | | | | | | | A couple of forward-declarations were causing warnings in clang: 'value' defined as a class here but previously declared as a struct [-Wmismatched-tags] Signed-off-by: Martina Kollarova <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* Treewide: Remove Elements() macroJan Vesely2016-05-171-1/+1
| | | | | Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* r600g,sb: Don't use standard macro nameJan Vesely2016-05-175-11/+11
| | | | Signed-off-by: Jan Vesely <[email protected]>
* r600: move alu_op_table to .c fileNicolai Hähnle2016-05-131-3/+3
| | | | | | | | So that it gets compiled and emitted only once, saving space is the final binary. Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* gallium/r600: removing double semi-colonsJakob Sinclair2016-04-261-1/+1
| | | | | | | | | | Trivial change. Removing unnecessary semi-colons from the code. I don't have push access so someone reviewing this can push it. Signed-off-by: Jakob Sinclair <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* gallium: use PIPE_SHADER_* everywhere, remove TGSI_PROCESSOR_*Marek Olšák2016-04-221-7/+7
| | | | Acked-by: Jose Fonseca <[email protected]>
* r600/sb: Do not distribute neg in expr_handler::fold_assoc() when folding ↵xavier2016-03-221-2/+6
| | | | | | | | | | | | | | | multiplications. Previously it was doing this transformation for a Trine 3 shader: MUL R6.x.12, R13.x.23, 0.5|3f000000 - MULADD R4.x.12, -R6.x.12, 2|40000000, 1|3f800000 + MULADD R4.x.12, -R13.x.23, -1|bf800000, 1|3f800000 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94412 Signed-off-by: Xavier Bouchoux <[email protected]> Cc: "11.0 11.1 11.2" <[email protected]> Reviewed-by: Glenn Kennard <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: add LS/HS hw shader types.Dave Airlie2015-12-073-3/+9
| | | | | | | This just adds printing for the hw shader types, and hooks it up. Reviewed-by: Glenn Kennard <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600g/sb: Support LDS ops in SB bytecode I/OGlenn Kennard2015-12-074-9/+105
| | | | | | | This just adds the LDS ops to the SB bytecode reader/writers. Signed-off-by: Glenn Kennard <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: add support for GDS to the sb decoder/dump. (v1.1)Dave Airlie2015-12-074-10/+93
| | | | | | | | | This just adds support to the decoder, not actual SB support. v1.1: fixup GDS relative mode. (Glenn). Reviewed-by: Glenn Kennard <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600g/sb: SB support for UBO indexingGlenn Kennard2015-10-139-19/+140
| | | | | Signed-off-by: Glenn Kennard <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600g/sb: Support gs5 sampler indexing (v2)Glenn Kennard2015-10-137-16/+188
| | | | | | | [airlied: v2 cayman fixups] Signed-off-by: Glenn Kennard <[email protected]> Signed-off-by: Dave Airlie <[email protected]>