summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/r600
Commit message (Collapse)AuthorAgeFilesLines
* r600/sb: fix crash in fold_alu_op3Roland Scheidegger2018-07-091-0/+2
| | | | | | | | | | | | | | | | | | | | | fold_assoc() called from fold_alu_op3() can lower the number of src to 2, which then leads to an invalid access to n.src[2]->gvalue(). This didn't seem to have caused much harm in the past, but on Fedora 28 it will crash (presumably because -D_GLIBCXX_ASSERTIONS is used, although with libstdc++ 4.8.5 this didn't do anything, -D_GLIBCXX_DEBUG was needed to show the issue). An alternative fix would be to instead call fold_alu_op2() from within fold_assoc() when the number of src is reduced and return always TRUE from fold_assoc() in this case, with the only actual difference being the return value from fold_alu_op3() then. I'm not sure what the return value actually should be in this case (or whether it even can make a difference). https://bugs.freedesktop.org/show_bug.cgi?id=106928 Cc: [email protected] Reviewed-by: Dave Airlie <[email protected]> (cherry picked from commit 817efd89685efc6b5866e09cbdad01c4ff21c737)
* r600/sb: cleanup if_conversion iterator to be legal C++Dave Airlie2018-07-051-7/+4
| | | | | | | | | | | | | | | | The current code causes: /usr/include/c++/8/debug/safe_iterator.h:207: Error: attempt to copy from a singular iterator. This is due to the iterators getting invalidated, fix the reverse iterator to use the return value from erase, and cast it properly. (used Mathias suggestion) Cc: <[email protected]> Reviewed-by: Mathias Fröhlich <[email protected]> (cherry picked from commit 8c51caab2404c5c9f5211936d27e9fe1c0af2e7d)
* eg/compute: Use reference counting to handle compute memory pool.Jan Vesely2018-05-182-12/+7
| | | | | | | | | | | | | | Use pipe_reference to release old RAT surfaces. RAT surface adds a reference to pool bo, so use reference counting for pool->bo as well. v2: Use the same pattern for both defrag paths Drop confusing comment CC: <[email protected]> Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Dave Airlie <[email protected]> (cherry picked from commit f3521ce2c440bd50020a3ff81e6d9fa17c01009c)
* r600: fix constant buffer bounds.Dave Airlie2018-05-102-2/+2
| | | | | | | | | | | | | | | | | | | | | If you have an indirect access to a constant buffer on r600/eg use a vertex fetch in the shader. However apps have expected behaviour on those out of bounds accessess (even if illegal). If the constants were being uploaded as part of a larger upload buffer, we'd set the range of allowed access to a lot larger than required so apps would get values back from other parts of the upload buffer instead of the expected out of bounds access. This fixes rendering bugs in Trine and Witcher 1, thanks to iive for nagging me effectively until I figured it out :-) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91808 Cc: <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> (cherry picked from commit ce027ac5c798b39582288e5d7d9973b3cdda591e)
* eg/compute: Drop reference to kernel_param bo in destructorJan Vesely2018-05-081-0/+1
| | | | | | | CC: <[email protected]> Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Marek Olšák <[email protected]> (cherry picked from commit a9e4be9212868f619ac492aaf86b0aa68d4395c4)
* eg/compute: Drop reference on code_bo in destructor.Jan Vesely2018-05-081-3/+1
| | | | | | Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> (cherry picked from commit ea1fff4416036066cff51826f95b4703d7211008)
* r600: Cleanup constant buffers on context destructionJan Vesely2018-05-081-1/+5
| | | | | | | CC: <[email protected]> Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Marek Olšák <[email protected]> (cherry picked from commit a1e8fcce3eafa59228bb9bb50179c04f150ca9ca)
* ac/gpu_info: rename has_virtual_memory -> r600_has_virtual_memoryMarek Olšák2018-04-023-6/+6
|
* util: Move util_is_power_of_two to bitscan.h and rename to ↵Ian Romanick2018-03-291-1/+1
| | | | | | | | | | | util_is_power_of_two_or_zero The new name make the zero-input behavior more obvious. The next patch adds a new function with different zero-input behavior. Signed-off-by: Ian Romanick <[email protected]> Suggested-by: Matt Turner <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]>
* gallium: add packed uniform CAPTimothy Arceri2018-03-201-0/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* r600: consolidate PIPE_BIND_SHARED/SCANOUT handlingMarek Olšák2018-03-162-14/+4
| | | | | | | | | | (Ported from radeonsi commit f70f6baaa3bb0f8b280ac2eaea69bbffaf7de840) Allows cached BOs to be reused in more cases. Bugzilla: https://bugs.freedesktop.org/105171 Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Michel Dänzer <[email protected]>
* r600: fix abs for op3 sourcesRoland Scheidegger2018-03-141-54/+56
| | | | | | | | | | | | | | | | | | | If a src was referencing the same temp as the dst, the per-component copy code didn't work. e.g. cndge r0.xy, r0.xx, |r2|, r3 got expanded into mov r12.x, |r2| cndge r0.x, r0.x, r12, r3 mov r12.y, |r2| cndge r0.y, r0.x, r12, r3 hence for the second cndge r0.x was mistakenly the previous cndge result. Fix this by doing all the movs first, so there's no bogus alu.last in between. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=102905 Tested-by: <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600: implement callstack workaround for evergreen.Dave Airlie2018-03-121-8/+31
| | | | | | | | | | | | | | | | | | | | This is ported from the sb backend, there are some issues with evergreen stacks on the boundary between entries and ALU_PUSH_BEFORE instructions. Whenever we are going to use a push before, we check the stack usage and if we have to use the workaround, then we switch to a separate push. I noticed this problem dealing with some of the soft fp64 shaders, in nosb mode, they are quite stack happy. This fixes all the glitches and inconsistencies I've seen with them Reviewed-by: Roland Scheidegger <[email protected]> Tested-by: Elie Tournier <[email protected]> Cc: <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: fix color export maskRoland Scheidegger2018-03-051-0/+1
| | | | | | | | | | | The r600 code (not the eg one) forgot to copy the ps_color_export_mask in commit 5b14e06d8b42e2b08ebc52b6c314ef8647d87a1f when updating the pixel state, leading to misrenderings (probably with MRT). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105262 Tested-by: LoneVVolf <[email protected]> Tested-by: Pavel Vinogradov <[email protected]>
* r600/cayman: fix fragcood loading recip generation.Dave Airlie2018-03-021-1/+1
| | | | | | | | This fixes some hangs seen where the recip_ieee opcodes would end up split across the wrong slots. Cc: <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/shader: when using images always load thread id gpr at start (v2)Dave Airlie2018-02-281-15/+7
| | | | | | | | | | | | The delayed loading code was fail if we had control flow. This fixes: tests/spec/arb_shader_image_load_store/execution/image_checkerboard.shader_test v2: don't use temp_reg before setting temp_reg up. Tested-by: Gert Wollny <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: fix whitespace in recent 1d texture commit.Dave Airlie2018-02-281-1/+1
| | | | trivial fix.
* r600: partly revert disabling tiling for 1d texture.Dave Airlie2018-02-281-0/+5
| | | | | | | | | | | | Previously we had a check for 1d of narrow 2D textures, however narrow 2d textures caused gpu hangs, but it was correct for 1d textures. This fixes a bunch of 1D image piglits for me. Fixes: 7b8e1c089d (r600/texture: drop lowering 1d/2d images to linear.) Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: fix tgsi clock last settingDave Airlie2018-02-261-0/+1
| | | | | | | On cayman this was hitting an assert later, which probably wasn't see on non-cayman due to having the t slot. Fixes: 9041730d1 (r600: add support for ARB_shader_clock.)
* r600: add time lo/hi debugging output.Dave Airlie2018-02-262-0/+12
| | | | This just adds the these to the debug prints.
* r600: Take ALU_EXTENDED into account when evaluating jump offsetsGert Wollny2018-02-261-2/+7
| | | | | | | | | | | ALU_EXTENDED needs 4 DWORDS instead of the usual 2, hence if the last ALU clause within a IF-JUMP or ELSE branch is ALU_EXTENDED the target jump offset needs to be adjusted accordingly. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104654 Cc: <[email protected]> Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* gallium: allow drivers to impose BO flags restrictions on constant buffer 0Marek Olšák2018-02-171-0/+1
| | | | Required by radeonsi for optimal behavior.
* r600: fix regression in gl_FragColor drawingDave Airlie2018-02-141-0/+2
| | | | | | | This fixes a regression in the broadcast color to all color bufs case. Fixes: 6c691081a (r600: fixup sparse color exports.) Signed-off-by: Dave Airlie <[email protected]>
* r600: fix array spill if temp[0] is before all arraysDave Airlie2018-02-141-0/+2
| | | | | | | | | | | | | | | | I found a shader with DCL TEMP[0], LOCAL DCL TEMP[1..256], ARRAY(1), LOCAL DCL TEMP[257..512], ARRAY(2), LOCAL DCL TEMP[513..768], ARRAY(3), LOCAL DCL TEMP[769], LOCAL This would remap badly, as it would add up all the spilled sizes and subtract it from the temp for 0. If the current temp is less than the array start break out. Fixes: 1d871aa6 (r600g: Implement spilling of temp arrays (v2)) Signed-off-by: Dave Airlie <[email protected]>
* gallium: drop all the guard band float caps.Dave Airlie2018-02-141-5/+0
| | | | | | | | | | Nobody queries these and nobody sets them to anything useful, the docs say TODO. Drop them until a use appears. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: always return PIPE_SHADER_IR_TGSI for PIPE_SHADER_CAP_PREFERRED_IRTimothy Arceri2018-02-101-5/+1
| | | | | | | We now use PIPE_SHADER_CAP_SUPPORTED_IRS to check for native support in clover. Reviewed-by: Marek Olšák <[email protected]>
* r600: add PIPE_SHADER_IR_NATIVE to supported shaders for csTimothy Arceri2018-02-101-3/+7
| | | | | Acked-by: Pierre Moreau <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* r600/sb: Check whether optimizations would result in reladdr conflictGert Wollny2018-02-093-4/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | v2: * Check whether the node src and dst registers are NULL before using them. * fix a type in the commit message. Two cases are handled with this patch: 1. If copy propagation tries to eliminated a move from a relative array access then it could optimize MOV R1, ARRAY[RELADDR_1] MOV R2, ARRAY[RELADDR_2] OP2 R3, R1 R2 into OP2 R3, ARRAY[RELADDR_1], ARRAY[RELADDR_2] which is forbidden, because there is only one address register available. 2. When MULADD(x,a,MUL(x,c)) is handled MUL TMP, R1, ARRAY[RELADDR_1] MULLADD R3, R1, ARRAY[RELADDR_2], TMP by folding this into ADD TMP, ARRAY[RELADDR_2], ARRAY[RELADDR_1] MUL R3, R1, TMP which is also forbidden. Test for these cases and reject the optimization if a forbidden combination of relative access would be created. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103142 Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600g: Implement spilling of temp arrays (v2)Glenn Kennard2018-02-093-8/+292
| | | | | | | | | Pessimistically spills arrays if GPR limit is exceeded. v2: fix r600 support [airlied] Signed-off-by: Glenn Kennard <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600/sb: handle scratch mem reads on r600Dave Airlie2018-02-092-5/+23
| | | | | | | | | | On r600 we use the scratch mem with read/read_ind, in that case sb should track the rw_gpr as a dst instead of a src. This stops the whole shader being optimised out. Signed-off-by: Dave Airlie <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600g/sb: Add dependency tracking for scratch opsGlenn Kennard2018-02-098-4/+22
| | | | | Signed-off-by: Glenn Kennard <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600g/sb: Support scratch opsGlenn Kennard2018-02-095-1/+153
| | | | | Signed-off-by: Glenn Kennard <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600g: Implement scratch buffer state management (v2)Glenn Kennard2018-02-096-1/+152
| | | | | | | v2: add Glenn's fixes Signed-off-by: Glenn Kennard <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600g: Add pending output functionGlenn Kennard2018-02-092-0/+22
| | | | | | | | Spills have to happen after the VLIW bundle currently processed, so defer emitting the spill op. Signed-off-by: Glenn Kennard <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600g: Support emitting scratch opsGlenn Kennard2018-02-094-1/+77
| | | | | Signed-off-by: Glenn Kennard <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600: fix texture gather swizzling.Dave Airlie2018-02-091-7/+7
| | | | | | | | | This fixes: KHR-GL45.texture_gather.swizzle on cayman and redwood. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: implement tg4 integer workaround. (v2)Dave Airlie2018-02-081-0/+162
| | | | | | | | | | | This ports the texture gather integer workaround from radeonsi. This fixes: KHR-GL45.texture_gather.plain-gather-uint/int* v2: add rect support, fix 2d array shadow Reviewed-by: Roland Scheidegger <[email protected]> (on irc) Signed-off-by: Dave Airlie <[email protected]>
* r600: clean up initial shader register setupGlenn Kennard2018-02-081-20/+17
| | | | | | | | This is taken from Glenn Kennards scratch series, but separated out as a cleanup by me. Reviewed-By: Gert Wollny <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: partly fix sampleMaskIn valueRoland Scheidegger2018-02-081-0/+54
| | | | | | | | | | | | | | | | | | | | | | | | The hw gives us coverage for pixel, not for individual fragment shader invocations, in case execution isn't per pixel (eg, unlike cm, actually cannot do "real" minSampleShading, it's either per-pixel or per-fragment, but it doesn't really make a difference here). Also, with msaa disabled, the hw still gives us a mask corresponding to the number of samples, where GL requires this to be 1. Fix this up by masking the sampleMaskIn bits with the bit corresponding to the sampleID, if we know this shader is always executed at per-sample granularity. (In case of a per-sample frequency shader and msaa disabled, the sampleID will always be 0, so this works just fine there.) Fixing this for the minSampleShading case will need a shader key (radeonsi uses the prolog part for) (for eg, could get away with a single bit, cm would need more bits depending on sample/invocation ratio, or read the bits from a uniform), unless we'd want to always use a sample mask uniform (which is probably not a good idea, as it would make the ordinary common msaa case slower for no good reason). This fixes some parts of piglit arb_sample_shading-samplemask (with fixed test), in particular those which use a sampleID, still failing others as expected. Reviewed-by: Dave Airlie <[email protected]>
* r600: clean up fragment shader input scan codeRoland Scheidegger2018-02-081-52/+23
| | | | | | | | | | | | | For some reason, we were iterating through the code twice (first just for instructions needing barycentrics, then for instructions and input dcls). Move things around slightly so this is no longer necessary. There also was a unnedeed enabling of the fixed_pt_position_gpr - this is only needed if the per-sample interpolation comes from an input, not from an instruction (just move the assert where it belongs) (since the sample id to sample from comes from a tgsi src in this case, and isn't sampleID). Otherwise there should be no functional change. Reviewed-by: Dave Airlie <[email protected]>
* r600/cm: (trivial) code cleanup for emitting msaa stateRoland Scheidegger2018-02-083-16/+14
| | | | | | No functional change (compile tested only). Reviewed-by: Dave Airlie <[email protected]>
* r600: fix rendering regression on r6/7 gpusDave Airlie2018-02-081-1/+6
| | | | | | | | Fixes: 2d5b5d267e (r600: work out target mask at framebuffer bind.) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104989 Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: fixup sparse color exports.Dave Airlie2018-02-073-1/+12
| | | | | | | | | | | | | | | If we have gaps in the shader mask we have to have 0x1 in them according to a comment in radeonsi, and this is required to fix the test at least on cayman. We also need to record the highest one written to write to the ps exports reg. This fixes: KHR-GL45.enhanced_layouts.fragment_data_location_api Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: work out target mask at framebuffer bind.Dave Airlie2018-02-073-4/+9
| | | | | | | If we only get 1,2,3,6 framebuffers we want a sparse target mask. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: work out shader export mask at shader build time (v1.1)Dave Airlie2018-02-076-3/+13
| | | | | | | | | | | Since enhanced layouts allows setting specific MRT outputs, we can get sparse outputs, so we have to calculate the shader mask earlier. v1.1: update checks for state update (Roland) Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: fix xfb stream check.Dave Airlie2018-02-071-1/+1
| | | | | | | | | This fixes: KHR-GL45.enhanced_layouts.xfb_vertex_streams Reviewed-by: Roland Scheidegger <[email protected]> Cc: <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/compute: add render cond support.Dave Airlie2018-02-071-2/+5
| | | | | | | | | | Set render cond and emit atom. Fixes: KHR-GL45.compute_shader.conditional-dispatching Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: fix not-very indirect computeDave Airlie2018-02-071-12/+18
| | | | | | | | | | | | | We need to get the grid sizes earlier to fill in to the const buffer. Fixes: KHR-GL45.compute_shader.built-in-variables and KHR-GL45.compute_shader.dispatch-indirect Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: overhaul buffer resource query.Dave Airlie2018-02-071-7/+8
| | | | | | | | | | | | | This cleans up and fixes the previous fix even more. Buffers from textures start at max const, buffers from buffers/images come in from the 168 offset. This fixes a bunch of: KHR-GL45.shader_storage_buffer_object* Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/eg: fix buffer sizing.Dave Airlie2018-02-071-1/+3
| | | | | | | | | | | For buffers we want the size in bytes, For images we want it in elements. This fixes: KHR-GL45.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-pad Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>