summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/r600
Commit message (Collapse)AuthorAgeFilesLines
* gallium: introduce PIPE_CAP_FENCE_SIGNAL v2Andres Rodriguez2018-01-301-0/+1
| | | | | | | | | Protects semaphore signaling functionality required by GL_EXT_semaphore. v2: s/semaphore/fence Signed-off-by: Andres Rodriguez <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* r600/sb: insert the else clause when we might depart from a loopDave Airlie2018-01-311-0/+17
| | | | | | | | | | | | | | If there is a break inside the else clause and this means we are breaking from a loop, the loop finalise will want to insert the LOOP_BREAK/CONTINUE instruction, however if we don't emit the else there is no where for these to end up, so they will end up in the wrong place. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101442 Tested-By: Gert Wollny <[email protected]> Cc: <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: add ARB_query_buffer_object supportDave Airlie2018-01-2910-31/+817
| | | | | | | | | | | | | | This uses a different shader than radeonsi, as we can't address non-256 aligned ssbos, which the radeonsi code does. This passes some extra offsets into the shader. It also contains a set of u64 instruction implementation that may or may not be complete (at least the u64div is definitely not something that works outside this use-case). If r600 grows 64-bit integers, it will use the GLSL lowering for divmod. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/shader: refactor mul hi/lo instruction emissionDave Airlie2018-01-291-254/+116
| | | | | | | This just makes it a bit simpler for cayman vs eg Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/eg: construct proper rat mask for image/buffers.Dave Airlie2018-01-293-8/+30
| | | | | | | | | | If the images/buffer bindings had a gap, this produced the wrong values, this should fix that to generate the correct rat mask for mixes of images/buffers/cbs. Reviewed-by: Roland Scheidegger <[email protected]> Cc: "18.0" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* autotools: include meson build files in tarballDylan Baker2018-01-191-1/+2
| | | | | | | | | | | | This adds the meson.build, meson_options.txt, and a few scripts that are used exclusively by the meson build. v2: - Remove accidentally included changes needed to test make dist with LLVM > 3.9 Signed-off-by: Dylan Baker <[email protected]> Acked-by: Eric Engestrom <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* r600: enable ARB_enhanced_layoutsDave Airlie2018-01-191-1/+1
| | | | | | | | | | | Only one piglit test fails, sso-vs-gs-fs-array-interleave There are 3 tests using ssbo without checking sizes failing also but those are test bugs. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: add lds related peepholes.Dave Airlie2018-01-181-1/+8
| | | | | | | | | if no destination: a) convert _RET instructions to non _RET variants if no dst b) set src0 to undefined if it's a READ, this should get DCE then. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: use different stacks for tracking lds and queue usage.Dave Airlie2018-01-182-3/+24
| | | | | | | | | | | | | | The normal ssa renumbering isn't sufficient for LDS queue access, this uses two stacks, one for the lds queue, and one for the lds r/w ordering. The LDS oq values are incremented in their use in a linear fashion. The LDS rw values are incremented in their definitions and used in the next lds operation to ensure reordering doesn't occur. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: schedule LDS ops in appropriate places.Dave Airlie2018-01-182-0/+7
| | | | | | | | | | So LDS ops have to be SLOT_X, and LDS OQ reads have read port restrictions so we try and force those into only having one per slot and avoiding bank swizzles. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: hit the scheduler with a big hammer to avoid lds splits.Dave Airlie2018-01-181-0/+3
| | | | | | | | | | | | | This tries to avoid an lds queue read getting scheduled separately from an lds ret read, the non-sb code uses the same style of hammer, this isn't foolproof. We can do better, but it's a bit tricky, as you have to scan ahead and either schedule more lds oq moves and more lds reads and that could lead to you running out of space anyways. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: adding lds oq tracking to the schedulerDave Airlie2018-01-182-3/+15
| | | | | | | | | | | This adds support for tracking the lds oq read/writes so can avoid scheduling other things in between. This patch just adds the tracking and assert to show problems. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: add gcm support to avoid clause between lds read/queue readDave Airlie2018-01-182-2/+17
| | | | | | | | | | You have to schedule LDS_READ_RET _, x and MOV reg, LDS_OQ_A_POP in the same basic block/clause. This makes sure once we've issues and MOV we don't add another block until we balance it with an LDS read. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: handle lds special dest registers.Dave Airlie2018-01-182-2/+2
| | | | | | | This adds lds to the geom emit handling Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: handle LDS operations in folding.Dave Airlie2018-01-181-0/+11
| | | | | | | Don't try and fold LDS using expressions. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: add finalising for lds output queue special values.Dave Airlie2018-01-181-0/+12
| | | | | | | We need to convert these to the hw special registers. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: add initial support for parsing lds operations.Dave Airlie2018-01-181-2/+50
| | | | | | | This handles parsing the LDS ops and queue accessess. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: disable if conversion for hsDave Airlie2018-01-181-1/+1
| | | | | | | This fixes bad interactions with the LDS special values. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: lds ops have no dst register.Dave Airlie2018-01-181-1/+1
| | | | | | | Although these are op3s they don't have a dst reg. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: introduce special register values for lds support.Dave Airlie2018-01-183-1/+33
| | | | | | | | | | | | | For LDS read/write ordering we use the LDS_RW value, reads will wait on previous writes. For LDS read/read from LDS queue ordering we use the LDS_OQ values, we define two for now, though initially we'll just support OQA. Also add the check for the lds oq values Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: update last_cf if alu is the last clauseDave Airlie2018-01-181-0/+1
| | | | | | | | | | | It's rare to have a final alu clause on normal shaders (exports) but tess shaders write to LDS as their output, so we see some alu clauses, and the CF_END get put in the wrong place. This makes sure to update last_cf correctly. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: start adding GDS supportDave Airlie2018-01-1813-13/+123
| | | | | | | | | This adds support for GDS ops to sb backend. This seems to work for atomics and tess factor writes. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: add tess/compute initial state registers.Dave Airlie2018-01-181-1/+4
| | | | | | | This stops them being optimised out. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: fix a bug emitting ar load from a constant.Dave Airlie2018-01-181-0/+3
| | | | | | | | | | | | Some tess shaders were doing MOVA_INT _, c0.x on cayman, and then hitting an assert in sb_bc_finalize.cpp:translate_kcache. This makes sure the toplevel kcache tracker gets updated, and the clause gets fixed up. Reviewed-by: Roland Scheidegger <[email protected]> Cc: <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/shader: only emit add instruction if param has a value.Dave Airlie2018-01-181-6/+8
| | | | | | | Just saves a pointless a = a + 0; Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: emit 0 gds_op for tf write.Dave Airlie2018-01-181-2/+3
| | | | | | | This field is ignored for tf writes so should be 0. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: add support for ARB_shader_clock.Dave Airlie2018-01-183-5/+29
| | | | | Reviewed-by: Gert Wollny <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* gallium: remove PIPE_CAP_USER_CONSTANT_BUFFERSMarek Olšák2018-01-171-1/+0
| | | | | Reviewed-by: Roland Scheidegger <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* gallium: remove PIPE_CAP_TEXTURE_SHADOW_MAPMarek Olšák2018-01-171-1/+0
| | | | | Reviewed-by: Roland Scheidegger <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* gallium: remove PIPE_CAP_TWO_SIDED_STENCILMarek Olšák2018-01-171-1/+0
| | | | | Reviewed-by: Roland Scheidegger <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* r600/shader: Initialize max_driver_temp_used correctly for the first timeGert Wollny2018-01-151-0/+1
| | | | | | | | | | | | Without this initialization the temp registers used in tgsi_declaration may used random indices, and this may result in failing translation from TGSI with an error message "GPR limit exceeded", because the random index is greater then the allowed limit implying that the shader uses more temporary registers then available. Signed-off-by: Gert Wollny <[email protected]> Cc: <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: don't emit tes samplers/views when tes isn't activeRoland Scheidegger2018-01-102-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | Similar to const buffers. The driver must not emit any tes-related state if tes is disabled, since the hw slots are all shared by VS, therefore it would overwrite them (the mesa state tracker might not do this, but it would be perfectly legal to do so). Nevertheless I think the dirty state tracking logic in the driver is fundamentally flawed when tes is disabled/enabled, since it looks to me like the VS (and TES) state would not get reemitted to the correct slots (if it's not dirty anyway). Unless I'm missing something... Theoretically, the overwrite problem could be solved by using non-overlapping resource slots for TES and VS (since we're not even close to using half the resource slots), but it wouldn't work for constant buffers nor samplers, and for VS would still need to propagate changes to both LS and VS, so probably not a useful idea. Unfortunately there's zero coverage of this with piglit, since all tessellation shader tests are just shader_runner tests, which are unsuitable for testing any kind of state dependency tracking issues (so I can't even quickly hack something up to proove it and fix it...). TCS otoh is just fine - like GS it has its own hw slots. Tested-by: Konstantin Kharlamov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600: increase number of UBOs to 15Roland Scheidegger2018-01-103-22/+37
| | | | | | | | | | | | | | | | | With the exception of the default tess levels only ever accessed by the default tcs shader, the LDS_INFO const buffer was only accessed by vtx instructions, and not through kcache. No idea why really, but use this to our advantage by not using a constant buffer slot for it. This just requires us to throw the default tess levels into the "normal" driver const buffer instead. Alternatively, could acesss those constants via vtx instructions too, but then we couldn't use a ordinary ureg prog accessing them as constants and would have to generate that directly when compiling the default tcs shader. (Another alternative would be to put all lds info into the ordinary driver const buffer, albeit we'd maybe need to increase the fixed size as it can't fit alongside the ucp since vs needs access to the lds info too.) Tested-by: Konstantin Kharlamov <[email protected]> Dave Airlie <[email protected]>
* r600: use GET_BUFFER_RESINFO vtx fetch on eg instead of setting up constsRoland Scheidegger2018-01-104-58/+50
| | | | | | | | | | | | | | | | | | | Contrary to what the comment said, this appears to work just fine on my rv770 (tested with piglit textureSize 140 fs/vs samplerBuffer). Dave Airlie confirmed it working on cayman too. I have no clue though if it's actually preferrable to use it (unfortunately we cannot get rid of the tex constants completely, as we still require them for cube map txq). Albeit filling in the format (1 channels or 4?) and the stuff related to mega- or mini-fetch (what the hell is this...) is just a guess based on other usage of vtx fetch instructions... v2: it really needs to be done through texture cache (I botched the testing because sb optimizations turned it automatically into tc, but can't rely on it and isn't happening on tes). Tested-by: Konstantin Kharlamov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600: increase number of ubos by one to 14Roland Scheidegger2018-01-104-4/+9
| | | | | | | | | | | | | | | Ideally we'd support 16 (d3d11 requires 15, and mesa subtracts one for non-ubo constants), but that's kind of impossible (it would be only doable if either we'd somehow merge the mesa non-ubo constants with the driver constants, or only use the driver constants with vtx fetch instead of through the kcache mechanism - the latter probably wouldn't be too bad). For now just do as the comment already said, place the gs ring (not really a const buffer in any case) which is only ever referred to through vc fetch clauses at index 16. Throw in a couple asserts for good measure to make sure the hw limit isn't exceeded. Tested-by: Konstantin Kharlamov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600: set up constants needed for txq for buffers and cube maps with tesRoland Scheidegger2018-01-101-0/+16
| | | | | | | | | We only did this for the other stages, but obviously tess eval/ctrl need it too. This fixes the (newly modified) piglit texturing/textureSize test when run with tes stage and bufferSampler. Reviewed-by: Dave Airlie <[email protected]>
* r600: don't emit reloc for ring buffer out into the blueRoland Scheidegger2018-01-102-8/+6
| | | | | | | It looks like this reloc belongs to setting the constant reg, which is skipped for gs ring. Reviewed-by: Dave Airlie <[email protected]>
* r600: hack up num_render_backends on Juniper to 8Roland Scheidegger2018-01-101-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | Juniper really has a maximum of 4 RBEs (16 pixels). However, predication always locks up on my HD 5750, and through experiments it looks like if we're pretending it has a maximum of 8, with 4 disabled, it works correctly. My conclusion would be that there's a bug (likely firmware, not hw) which causes the predication logic to try to read 8 results out of the query buffer instead of just 4, and since of course noone ever writes the upper 4, the status bit is never set and hence it will wait for it forever. Ideally this would be fixed in firmware, but I'd guess chances of that happening are slim. This will double the size of (occlusion) query result buffers, write the status bit for the disabled rbs in these buffers, and will also add 8 results together instead of just 4 when reading them back. The latter is unnecessary, but it's probably not worth bothering - luckily num_render_backends isn't used outside of occlusion queries, so don't need separate value for the "real" maximum. Also print out the enabled_rb_mask if it changed from the pre-fixed value (which is already printed out), just in case there's some more problems with chips which have some rbs disabled... This fixes all the lockups with piglit nv_conditional_render tests on my HD 5750 (all pass). Reviewed-by: Dave Airlie <[email protected]>
* r600: fix enabled_rb_mask on eg/cmRoland Scheidegger2018-01-101-2/+9
| | | | | | | | | | | | | | | | | | | | | | For eg/cm, the r600_gb_backend_map will always be 0. This is a bug in the drm kernel driver, as it just just never fills the information in (it is now being fixed - the history shows it was being filled in when the query was brand new but got lost shortly thereafter with backend_map fixes). This causes r600_query_hw_prepare_buffer to write the "status bit" (just the highest bit of the occlusion query result) even for active rbes (all but the first). This doesn't make much sense, albeit I suppose it's mostly safe. According to the commit history, it's necessary to set these bits for inactive rbes since otherwise predication will lock up - presumably the hw just is waiting for the status bit to appear, which will never happen with inactive rbes. I'd guess potentially predication could be wrong (due to not waiting for the actual result if the status bit is already there) if this is set for active rbes. Discovered while trying to fix predication lockups on Juniper (needs another patch). Reviewed-by: Dave Airlie <[email protected]>
* r600: fix sampler indexing with texture buffers samplingRoland Scheidegger2018-01-102-2/+4
| | | | | | | | This fixes the new piglit test. While here also fix up the logic for early exit of setting up driver consts. Tested-by: Konstantin Kharlamov <[email protected]> Reviewed-by: Reviewed-by: Dave Airlie <[email protected]>
* r600: don't use vtx offset for load_sample_positionRoland Scheidegger2018-01-101-1/+1
| | | | | | | | | | The offset looks bogus to me. Albeit in the end it doesn't matter, by the looks of it offsets smaller than 4 get ignored there (not sure of the rules, I suppose either non-dword aligned offsets never work there or the offset must be at least aligned to the size of a single element). Tested-by: Konstantin Kharlamov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600: drop l2 related queriesDave Airlie2018-01-103-18/+0
| | | | | | radeonsi only. Signed-off-by: Dave Airlie <[email protected]>
* r600/shader: only read back the necessary tess factor components.Dave Airlie2018-01-101-4/+4
| | | | | | This just reduces the lds reads for the the tess factor emission. Signed-off-by: Dave Airlie <[email protected]>
* meson: set opencl flags for r600Dylan Baker2018-01-081-2/+5
| | | | Signed-off-by: Dylan Baker <[email protected]>
* r600: fix textureSize queries with tbosRoland Scheidegger2017-12-302-24/+33
| | | | | | | | | | | | | | piglit doesn't care, but I'm quite confident that the size actually bound as range should be reported and not the base size of the resource (and some quick piglit test hacking confirms this). Also, the array in the constant buffer looks overallocated by a factor of 4. For eg, also decrease the size by another factor of 2 by using the same constant slot for both buffer size (required for txq for TBOs) and the number of layers for cube arrays, as these are mutually exclusive. Could of course use some more logic and only actually do this for the samplers/images/buffers where it's required rather than for all, but ah well... Reviewed-by: Dave Airlie <[email protected]>
* r600: kill off native_integer shader ctx flagRoland Scheidegger2017-12-301-18/+0
| | | | | | Maybe upon a time it wasn't always true. Reviewed-by: Dave Airlie <[email protected]>
* r600: fix atomic counter index mode getting emitted on pre-caymanDave Airlie2017-12-271-1/+1
| | | | | | | This is a regression since I added cayman atomic support, not sure it fixes anything, but the shader dumps look better. Signed-off-by: Dave Airlie <[email protected]>
* gallium/util: add util_num_layers helperMarek Olšák2017-12-251-4/+4
|
* gallium: plumb context priority through to driverRob Clark2017-12-191-0/+1
| | | | | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Andres Rodriguez <[email protected]> Reviewed-by: Wladimir J. van der Laan <[email protected]>
* r600: clear compressed flags in image state on unbind.Dave Airlie2017-12-191-0/+2
| | | | | | | | | If we aren't binding an image, clear the compressed flags. This fixes a segfault seen with an apitrace. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104331 Signed-off-by: Dave Airlie <[email protected]>