aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/r600
Commit message (Collapse)AuthorAgeFilesLines
* r600: add PIPE_SHADER_IR_NATIVE to supported shaders for csTimothy Arceri2018-02-101-3/+7
| | | | | Acked-by: Pierre Moreau <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* r600/sb: Check whether optimizations would result in reladdr conflictGert Wollny2018-02-093-4/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | v2: * Check whether the node src and dst registers are NULL before using them. * fix a type in the commit message. Two cases are handled with this patch: 1. If copy propagation tries to eliminated a move from a relative array access then it could optimize MOV R1, ARRAY[RELADDR_1] MOV R2, ARRAY[RELADDR_2] OP2 R3, R1 R2 into OP2 R3, ARRAY[RELADDR_1], ARRAY[RELADDR_2] which is forbidden, because there is only one address register available. 2. When MULADD(x,a,MUL(x,c)) is handled MUL TMP, R1, ARRAY[RELADDR_1] MULLADD R3, R1, ARRAY[RELADDR_2], TMP by folding this into ADD TMP, ARRAY[RELADDR_2], ARRAY[RELADDR_1] MUL R3, R1, TMP which is also forbidden. Test for these cases and reject the optimization if a forbidden combination of relative access would be created. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103142 Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600g: Implement spilling of temp arrays (v2)Glenn Kennard2018-02-093-8/+292
| | | | | | | | | Pessimistically spills arrays if GPR limit is exceeded. v2: fix r600 support [airlied] Signed-off-by: Glenn Kennard <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600/sb: handle scratch mem reads on r600Dave Airlie2018-02-092-5/+23
| | | | | | | | | | On r600 we use the scratch mem with read/read_ind, in that case sb should track the rw_gpr as a dst instead of a src. This stops the whole shader being optimised out. Signed-off-by: Dave Airlie <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600g/sb: Add dependency tracking for scratch opsGlenn Kennard2018-02-098-4/+22
| | | | | Signed-off-by: Glenn Kennard <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600g/sb: Support scratch opsGlenn Kennard2018-02-095-1/+153
| | | | | Signed-off-by: Glenn Kennard <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600g: Implement scratch buffer state management (v2)Glenn Kennard2018-02-096-1/+152
| | | | | | | v2: add Glenn's fixes Signed-off-by: Glenn Kennard <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600g: Add pending output functionGlenn Kennard2018-02-092-0/+22
| | | | | | | | Spills have to happen after the VLIW bundle currently processed, so defer emitting the spill op. Signed-off-by: Glenn Kennard <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600g: Support emitting scratch opsGlenn Kennard2018-02-094-1/+77
| | | | | Signed-off-by: Glenn Kennard <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600: fix texture gather swizzling.Dave Airlie2018-02-091-7/+7
| | | | | | | | | This fixes: KHR-GL45.texture_gather.swizzle on cayman and redwood. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: implement tg4 integer workaround. (v2)Dave Airlie2018-02-081-0/+162
| | | | | | | | | | | This ports the texture gather integer workaround from radeonsi. This fixes: KHR-GL45.texture_gather.plain-gather-uint/int* v2: add rect support, fix 2d array shadow Reviewed-by: Roland Scheidegger <[email protected]> (on irc) Signed-off-by: Dave Airlie <[email protected]>
* r600: clean up initial shader register setupGlenn Kennard2018-02-081-20/+17
| | | | | | | | This is taken from Glenn Kennards scratch series, but separated out as a cleanup by me. Reviewed-By: Gert Wollny <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: partly fix sampleMaskIn valueRoland Scheidegger2018-02-081-0/+54
| | | | | | | | | | | | | | | | | | | | | | | | The hw gives us coverage for pixel, not for individual fragment shader invocations, in case execution isn't per pixel (eg, unlike cm, actually cannot do "real" minSampleShading, it's either per-pixel or per-fragment, but it doesn't really make a difference here). Also, with msaa disabled, the hw still gives us a mask corresponding to the number of samples, where GL requires this to be 1. Fix this up by masking the sampleMaskIn bits with the bit corresponding to the sampleID, if we know this shader is always executed at per-sample granularity. (In case of a per-sample frequency shader and msaa disabled, the sampleID will always be 0, so this works just fine there.) Fixing this for the minSampleShading case will need a shader key (radeonsi uses the prolog part for) (for eg, could get away with a single bit, cm would need more bits depending on sample/invocation ratio, or read the bits from a uniform), unless we'd want to always use a sample mask uniform (which is probably not a good idea, as it would make the ordinary common msaa case slower for no good reason). This fixes some parts of piglit arb_sample_shading-samplemask (with fixed test), in particular those which use a sampleID, still failing others as expected. Reviewed-by: Dave Airlie <[email protected]>
* r600: clean up fragment shader input scan codeRoland Scheidegger2018-02-081-52/+23
| | | | | | | | | | | | | For some reason, we were iterating through the code twice (first just for instructions needing barycentrics, then for instructions and input dcls). Move things around slightly so this is no longer necessary. There also was a unnedeed enabling of the fixed_pt_position_gpr - this is only needed if the per-sample interpolation comes from an input, not from an instruction (just move the assert where it belongs) (since the sample id to sample from comes from a tgsi src in this case, and isn't sampleID). Otherwise there should be no functional change. Reviewed-by: Dave Airlie <[email protected]>
* r600/cm: (trivial) code cleanup for emitting msaa stateRoland Scheidegger2018-02-083-16/+14
| | | | | | No functional change (compile tested only). Reviewed-by: Dave Airlie <[email protected]>
* r600: fix rendering regression on r6/7 gpusDave Airlie2018-02-081-1/+6
| | | | | | | | Fixes: 2d5b5d267e (r600: work out target mask at framebuffer bind.) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104989 Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: fixup sparse color exports.Dave Airlie2018-02-073-1/+12
| | | | | | | | | | | | | | | If we have gaps in the shader mask we have to have 0x1 in them according to a comment in radeonsi, and this is required to fix the test at least on cayman. We also need to record the highest one written to write to the ps exports reg. This fixes: KHR-GL45.enhanced_layouts.fragment_data_location_api Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: work out target mask at framebuffer bind.Dave Airlie2018-02-073-4/+9
| | | | | | | If we only get 1,2,3,6 framebuffers we want a sparse target mask. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: work out shader export mask at shader build time (v1.1)Dave Airlie2018-02-076-3/+13
| | | | | | | | | | | Since enhanced layouts allows setting specific MRT outputs, we can get sparse outputs, so we have to calculate the shader mask earlier. v1.1: update checks for state update (Roland) Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: fix xfb stream check.Dave Airlie2018-02-071-1/+1
| | | | | | | | | This fixes: KHR-GL45.enhanced_layouts.xfb_vertex_streams Reviewed-by: Roland Scheidegger <[email protected]> Cc: <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/compute: add render cond support.Dave Airlie2018-02-071-2/+5
| | | | | | | | | | Set render cond and emit atom. Fixes: KHR-GL45.compute_shader.conditional-dispatching Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: fix not-very indirect computeDave Airlie2018-02-071-12/+18
| | | | | | | | | | | | | We need to get the grid sizes earlier to fill in to the const buffer. Fixes: KHR-GL45.compute_shader.built-in-variables and KHR-GL45.compute_shader.dispatch-indirect Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: overhaul buffer resource query.Dave Airlie2018-02-071-7/+8
| | | | | | | | | | | | | This cleans up and fixes the previous fix even more. Buffers from textures start at max const, buffers from buffers/images come in from the 168 offset. This fixes a bunch of: KHR-GL45.shader_storage_buffer_object* Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/eg: fix buffer sizing.Dave Airlie2018-02-071-1/+3
| | | | | | | | | | | For buffers we want the size in bytes, For images we want it in elements. This fixes: KHR-GL45.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-pad Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/images: set offset for compute shaders with number of declared samplersDave Airlie2018-02-071-1/+1
| | | | | | | | for frag shaders we get a value in the key, I expect I need to make compute work better Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/compute: only mark buffer/image state dirty for fragment shadersDave Airlie2018-02-071-2/+4
| | | | | | | | | | | The compute emission path always emits this currently, and emitting it on the fragment path breaks the blitter. This fixes gpu hangs in KHR-GL45.compute_shader.resource-texture Reviewed-by: Roland Scheidegger <[email protected]> Cc: <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/atomic: fix ATOMCAS instruction.Dave Airlie2018-02-071-1/+31
| | | | | | | | | | This has 4 srcs. This fixes: KHR-GL45.shader_atomic_counter_ops_tests.ShaderAtomicCounterOpsExchangeTestCase Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb/cayman: fix indirect ubo access on caymanDave Airlie2018-02-071-1/+1
| | | | | | | | | | | | | | With sb enabled on cayman, this was overwriting the proper cf index value with random ones if the dst gpr was 2 or 3, only save the value for a MOVA instruction. Fixes: KHR-GL45.gpu_shader5.uniform_blocks_array_indexing (on cayman with sb) Cc: <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/eg: use texture target to pick array size not view target (v2)Dave Airlie2018-02-071-7/+10
| | | | | | | | | | | | | | This fixes a few CTS cases in : KHR-GL45.texture_view.view_sampling some multisample cases are still broken, but not sure this is the same problem. v2: fix more cases Cc: <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/fp64: Fix build.Vinson Lee2018-02-051-1/+1
| | | | | | | | | | | | | CC r600_shader.lo r600_shader.c: In function ‘egcm_int_to_double’: r600_shader.c:4543:12: error: ‘ctx’ is a pointer; did you mean to use ‘->’? if (ctx.bc->chip_class == CAYMAN) ^ -> Fixes: 35b430157776 ("r600/fp64: fix integer->double conversion") Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* r600/fp64: fix integer->double conversionDave Airlie2018-02-061-28/+93
| | | | | | | | | | | | | Doing a straight uint/int->fp32->fp64 conversion causes some precision issues, Roland suggested splitting the integer into two portions and doing two separate int->fp32->fp64 conversions then adding the results. This passes the tests in CTS and piglit. [airlied: fix cypress conversion opcodes] Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: fix resq for buffer images.Dave Airlie2018-02-051-1/+4
| | | | | | | | | | | If this is an image buffer, we need to calculate the correct resource id. Fixes: KHR-GL45.shader_image_size.* Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/eg: fix cube map array buffer images.Dave Airlie2018-02-051-1/+1
| | | | | | | | This fixes a crash in: KHR-GL45.texture_cube_map_array.texture_size_compute_sh. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/eg: add crap indirect compute support.Dave Airlie2018-02-021-7/+19
| | | | | | | | I think the cp packets can be made work, but I think it might need a kernel change, so for now just do the worst thing. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: don't do stack workarounds for hemlockRoland Scheidegger2018-02-021-0/+1
| | | | | | | | | By the looks of it it seems hemlock is treated separately to cypress, but certainly it won't need the stack workarounds cedar/redwood (and seemingly every other eg chip except cypress/juniper) need. (Discovered by accident.) Acked-by: Alex Deucher <[email protected]>
* r600: initial attempt at gl_HelperInvocation (v3)Dave Airlie2018-02-026-3/+108
| | | | | | | | | | | | | This passes the CTS and piglit tests. This also disable sb for helper invocations until it doesn't mess up the VPM flags. Thanks to Ilia and Glenn for advice, and Roland for working out the working evergreen path. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/eg: make sure we allow vpm bit on other CF ops.Dave Airlie2018-02-011-0/+1
| | | | | | | the vpm bit wasn't being applied to the push/pop instructions. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: just add some missing debug bitsDave Airlie2018-02-011-0/+15
| | | | Signed-off-by: Dave Airlie <[email protected]>
* r600: fix buffer resinfo opcode translation.Dave Airlie2018-02-012-2/+2
| | | | | | | | | The vtx operations never got translated, so things worked by 0 being equal to 0, translate them so we can use the proper buffer resinfo code. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* gallium: introduce PIPE_CAP_FENCE_SIGNAL v2Andres Rodriguez2018-01-301-0/+1
| | | | | | | | | Protects semaphore signaling functionality required by GL_EXT_semaphore. v2: s/semaphore/fence Signed-off-by: Andres Rodriguez <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* r600/sb: insert the else clause when we might depart from a loopDave Airlie2018-01-311-0/+17
| | | | | | | | | | | | | | If there is a break inside the else clause and this means we are breaking from a loop, the loop finalise will want to insert the LOOP_BREAK/CONTINUE instruction, however if we don't emit the else there is no where for these to end up, so they will end up in the wrong place. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101442 Tested-By: Gert Wollny <[email protected]> Cc: <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: add ARB_query_buffer_object supportDave Airlie2018-01-2910-31/+817
| | | | | | | | | | | | | | This uses a different shader than radeonsi, as we can't address non-256 aligned ssbos, which the radeonsi code does. This passes some extra offsets into the shader. It also contains a set of u64 instruction implementation that may or may not be complete (at least the u64div is definitely not something that works outside this use-case). If r600 grows 64-bit integers, it will use the GLSL lowering for divmod. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/shader: refactor mul hi/lo instruction emissionDave Airlie2018-01-291-254/+116
| | | | | | | This just makes it a bit simpler for cayman vs eg Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/eg: construct proper rat mask for image/buffers.Dave Airlie2018-01-293-8/+30
| | | | | | | | | | If the images/buffer bindings had a gap, this produced the wrong values, this should fix that to generate the correct rat mask for mixes of images/buffers/cbs. Reviewed-by: Roland Scheidegger <[email protected]> Cc: "18.0" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* autotools: include meson build files in tarballDylan Baker2018-01-191-1/+2
| | | | | | | | | | | | This adds the meson.build, meson_options.txt, and a few scripts that are used exclusively by the meson build. v2: - Remove accidentally included changes needed to test make dist with LLVM > 3.9 Signed-off-by: Dylan Baker <[email protected]> Acked-by: Eric Engestrom <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* r600: enable ARB_enhanced_layoutsDave Airlie2018-01-191-1/+1
| | | | | | | | | | | Only one piglit test fails, sso-vs-gs-fs-array-interleave There are 3 tests using ssbo without checking sizes failing also but those are test bugs. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: add lds related peepholes.Dave Airlie2018-01-181-1/+8
| | | | | | | | | if no destination: a) convert _RET instructions to non _RET variants if no dst b) set src0 to undefined if it's a READ, this should get DCE then. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: use different stacks for tracking lds and queue usage.Dave Airlie2018-01-182-3/+24
| | | | | | | | | | | | | | The normal ssa renumbering isn't sufficient for LDS queue access, this uses two stacks, one for the lds queue, and one for the lds r/w ordering. The LDS oq values are incremented in their use in a linear fashion. The LDS rw values are incremented in their definitions and used in the next lds operation to ensure reordering doesn't occur. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: schedule LDS ops in appropriate places.Dave Airlie2018-01-182-0/+7
| | | | | | | | | | So LDS ops have to be SLOT_X, and LDS OQ reads have read port restrictions so we try and force those into only having one per slot and avoiding bank swizzles. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: hit the scheduler with a big hammer to avoid lds splits.Dave Airlie2018-01-181-0/+3
| | | | | | | | | | | | | This tries to avoid an lds queue read getting scheduled separately from an lds ret read, the non-sb code uses the same style of hammer, this isn't foolproof. We can do better, but it's a bit tricky, as you have to scan ahead and either schedule more lds oq moves and more lds reads and that could lead to you running out of space anyways. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>