summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/r600
Commit message (Collapse)AuthorAgeFilesLines
* r600: use GET_BUFFER_RESINFO vtx fetch on eg instead of setting up constsRoland Scheidegger2018-01-104-58/+50
| | | | | | | | | | | | | | | | | | | Contrary to what the comment said, this appears to work just fine on my rv770 (tested with piglit textureSize 140 fs/vs samplerBuffer). Dave Airlie confirmed it working on cayman too. I have no clue though if it's actually preferrable to use it (unfortunately we cannot get rid of the tex constants completely, as we still require them for cube map txq). Albeit filling in the format (1 channels or 4?) and the stuff related to mega- or mini-fetch (what the hell is this...) is just a guess based on other usage of vtx fetch instructions... v2: it really needs to be done through texture cache (I botched the testing because sb optimizations turned it automatically into tc, but can't rely on it and isn't happening on tes). Tested-by: Konstantin Kharlamov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600: increase number of ubos by one to 14Roland Scheidegger2018-01-104-4/+9
| | | | | | | | | | | | | | | Ideally we'd support 16 (d3d11 requires 15, and mesa subtracts one for non-ubo constants), but that's kind of impossible (it would be only doable if either we'd somehow merge the mesa non-ubo constants with the driver constants, or only use the driver constants with vtx fetch instead of through the kcache mechanism - the latter probably wouldn't be too bad). For now just do as the comment already said, place the gs ring (not really a const buffer in any case) which is only ever referred to through vc fetch clauses at index 16. Throw in a couple asserts for good measure to make sure the hw limit isn't exceeded. Tested-by: Konstantin Kharlamov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600: set up constants needed for txq for buffers and cube maps with tesRoland Scheidegger2018-01-101-0/+16
| | | | | | | | | We only did this for the other stages, but obviously tess eval/ctrl need it too. This fixes the (newly modified) piglit texturing/textureSize test when run with tes stage and bufferSampler. Reviewed-by: Dave Airlie <[email protected]>
* r600: don't emit reloc for ring buffer out into the blueRoland Scheidegger2018-01-102-8/+6
| | | | | | | It looks like this reloc belongs to setting the constant reg, which is skipped for gs ring. Reviewed-by: Dave Airlie <[email protected]>
* r600: hack up num_render_backends on Juniper to 8Roland Scheidegger2018-01-101-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | Juniper really has a maximum of 4 RBEs (16 pixels). However, predication always locks up on my HD 5750, and through experiments it looks like if we're pretending it has a maximum of 8, with 4 disabled, it works correctly. My conclusion would be that there's a bug (likely firmware, not hw) which causes the predication logic to try to read 8 results out of the query buffer instead of just 4, and since of course noone ever writes the upper 4, the status bit is never set and hence it will wait for it forever. Ideally this would be fixed in firmware, but I'd guess chances of that happening are slim. This will double the size of (occlusion) query result buffers, write the status bit for the disabled rbs in these buffers, and will also add 8 results together instead of just 4 when reading them back. The latter is unnecessary, but it's probably not worth bothering - luckily num_render_backends isn't used outside of occlusion queries, so don't need separate value for the "real" maximum. Also print out the enabled_rb_mask if it changed from the pre-fixed value (which is already printed out), just in case there's some more problems with chips which have some rbs disabled... This fixes all the lockups with piglit nv_conditional_render tests on my HD 5750 (all pass). Reviewed-by: Dave Airlie <[email protected]>
* r600: fix enabled_rb_mask on eg/cmRoland Scheidegger2018-01-101-2/+9
| | | | | | | | | | | | | | | | | | | | | | For eg/cm, the r600_gb_backend_map will always be 0. This is a bug in the drm kernel driver, as it just just never fills the information in (it is now being fixed - the history shows it was being filled in when the query was brand new but got lost shortly thereafter with backend_map fixes). This causes r600_query_hw_prepare_buffer to write the "status bit" (just the highest bit of the occlusion query result) even for active rbes (all but the first). This doesn't make much sense, albeit I suppose it's mostly safe. According to the commit history, it's necessary to set these bits for inactive rbes since otherwise predication will lock up - presumably the hw just is waiting for the status bit to appear, which will never happen with inactive rbes. I'd guess potentially predication could be wrong (due to not waiting for the actual result if the status bit is already there) if this is set for active rbes. Discovered while trying to fix predication lockups on Juniper (needs another patch). Reviewed-by: Dave Airlie <[email protected]>
* r600: fix sampler indexing with texture buffers samplingRoland Scheidegger2018-01-102-2/+4
| | | | | | | | This fixes the new piglit test. While here also fix up the logic for early exit of setting up driver consts. Tested-by: Konstantin Kharlamov <[email protected]> Reviewed-by: Reviewed-by: Dave Airlie <[email protected]>
* r600: don't use vtx offset for load_sample_positionRoland Scheidegger2018-01-101-1/+1
| | | | | | | | | | The offset looks bogus to me. Albeit in the end it doesn't matter, by the looks of it offsets smaller than 4 get ignored there (not sure of the rules, I suppose either non-dword aligned offsets never work there or the offset must be at least aligned to the size of a single element). Tested-by: Konstantin Kharlamov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600: drop l2 related queriesDave Airlie2018-01-103-18/+0
| | | | | | radeonsi only. Signed-off-by: Dave Airlie <[email protected]>
* r600/shader: only read back the necessary tess factor components.Dave Airlie2018-01-101-4/+4
| | | | | | This just reduces the lds reads for the the tess factor emission. Signed-off-by: Dave Airlie <[email protected]>
* meson: set opencl flags for r600Dylan Baker2018-01-081-2/+5
| | | | Signed-off-by: Dylan Baker <[email protected]>
* r600: fix textureSize queries with tbosRoland Scheidegger2017-12-302-24/+33
| | | | | | | | | | | | | | piglit doesn't care, but I'm quite confident that the size actually bound as range should be reported and not the base size of the resource (and some quick piglit test hacking confirms this). Also, the array in the constant buffer looks overallocated by a factor of 4. For eg, also decrease the size by another factor of 2 by using the same constant slot for both buffer size (required for txq for TBOs) and the number of layers for cube arrays, as these are mutually exclusive. Could of course use some more logic and only actually do this for the samplers/images/buffers where it's required rather than for all, but ah well... Reviewed-by: Dave Airlie <[email protected]>
* r600: kill off native_integer shader ctx flagRoland Scheidegger2017-12-301-18/+0
| | | | | | Maybe upon a time it wasn't always true. Reviewed-by: Dave Airlie <[email protected]>
* r600: fix atomic counter index mode getting emitted on pre-caymanDave Airlie2017-12-271-1/+1
| | | | | | | This is a regression since I added cayman atomic support, not sure it fixes anything, but the shader dumps look better. Signed-off-by: Dave Airlie <[email protected]>
* gallium/util: add util_num_layers helperMarek Olšák2017-12-251-4/+4
|
* gallium: plumb context priority through to driverRob Clark2017-12-191-0/+1
| | | | | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Andres Rodriguez <[email protected]> Reviewed-by: Wladimir J. van der Laan <[email protected]>
* r600: clear compressed flags in image state on unbind.Dave Airlie2017-12-191-0/+2
| | | | | | | | | If we aren't binding an image, clear the compressed flags. This fixes a segfault seen with an apitrace. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104331 Signed-off-by: Dave Airlie <[email protected]>
* r600: only reported tgsi ir compute support on evergreen+Dave Airlie2017-12-181-1/+3
| | | | | | This fixes a crash on r600/r700. Signed-off-by: Dave Airlie <[email protected]>
* r600: export robust buffer accessDave Airlie2017-12-181-1/+1
| | | | Signed-off-by: Dave Airlie <[email protected]>
* r600: export GLSL 430Dave Airlie2017-12-181-1/+1
| | | | Signed-off-by: Dave Airlie <[email protected]>
* r600/cs: add compute support to capsDave Airlie2017-12-181-2/+2
| | | | Signed-off-by: Dave Airlie <[email protected]>
* r600: always flush between gfx and computeDave Airlie2017-12-185-0/+21
| | | | | | | | This is in no way optimal, but there seems to be some problems mixing at the moment, lots of hangs, it is possible, just need to figure out more magic. Signed-off-by: Dave Airlie <[email protected]>
* r600: fix unused variable warningDave Airlie2017-12-181-1/+0
| | | | Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: do not convert if-blocks that contain indirect array accessGert Wollny2017-12-073-2/+5
| | | | | | | | | | | | | | | | | | | | If an array is accessed within an if block, then currently it is not known whether the value in the address register is involved in the evaluation of the if condition, and converting the if condition may actually result in out-of-bounds array access. Consequently, if blocks that contain indirect array access should not be converted. Fixes piglits on r600/BARTS: spec/glsl-1.10/execution/variable-indexing/ vs-output-array-float-index-wr vs-output-array-vec3-index-wr vs-output-array-vec4-index-wr Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104143 Signed-off-by: Gert Wollny <[email protected]> Cc: <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: add support for compute grid/block sizes. (v2)Dave Airlie2017-12-064-3/+100
| | | | | | | | | | We just pass these in from outside in a constant buffer. The shader side stores them once they are accessed once. v2: fix to not use a temp_reg. Signed-off-by: Dave Airlie <[email protected]>
* r600: handle image/buffer sizes correctly.Dave Airlie2017-12-063-4/+21
| | | | | | This adds support to compute for the resq workarounds (buffer/cube sizes) Signed-off-by: Dave Airlie <[email protected]>
* r600/compute: add support for emitting compute image/buffer atomsDave Airlie2017-12-061-1/+9
| | | | Signed-off-by: Dave Airlie <[email protected]>
* r600/compute: handle atomic counters in compute state.Dave Airlie2017-12-061-0/+9
| | | | Signed-off-by: Dave Airlie <[email protected]>
* r600/compute: add support for TGSI compute shaders. (v1.1)Dave Airlie2017-12-062-28/+103
| | | | | | | | | | | This add paths to handle TGSI compute shaders and shader selection. It also avoids emitting certain things on tgsi paths, CBs, vertex buffers, config reg init (not required). v1.1: fix rat mask calc Signed-off-by: Dave Airlie <[email protected]>
* r600/shader: add compute support to shader assemblerDave Airlie2017-12-061-0/+14
| | | | Signed-off-by: Dave Airlie <[email protected]>
* r600/texture: drop lowering 1d/2d images to linear.Dave Airlie2017-12-061-8/+0
| | | | | | | This appears to cause hangs with compute images. Unless we can find more specifics, just don't do this for now. Signed-off-by: Dave Airlie <[email protected]>
* r600: refactor and export some shader selector code for computeDave Airlie2017-12-052-7/+27
| | | | | | This just moves some code around to make it easier to add compute. Signed-off-by: Dave Airlie <[email protected]>
* r600: add compute support to compressed resource handling.Dave Airlie2017-12-052-6/+26
| | | | | | This just adds support for decompressing compute resources. Signed-off-by: Dave Airlie <[email protected]>
* r600: update max threads per block for evergreen computeDave Airlie2017-12-051-0/+4
| | | | Signed-off-by: Dave Airlie <[email protected]>
* r600/shader: add local memory support to shader assembler.Dave Airlie2017-12-051-0/+165
| | | | | | | | This is needed for compute shaders. v1.1: make work for vectors, fix missing lds ops. Signed-off-by: Dave Airlie <[email protected]>
* r600/cs: add support for compute to image/buffers/atomics stateDave Airlie2017-12-054-19/+79
| | | | | | | This just adds the compute paths to state handling for the main objects Signed-off-by: Dave Airlie <[email protected]>
* r600: handle compute null key shader stateDave Airlie2017-12-051-0/+2
| | | | Signed-off-by: Dave Airlie <[email protected]>
* r600: add some missing cayman register definesDave Airlie2017-12-051-0/+4
| | | | | | These are just taken from the kernel, and were seen in some fglrx dumps. Signed-off-by: Dave Airlie <[email protected]>
* r600: don't set EOP on pop or loop endDave Airlie2017-12-051-1/+1
| | | | | | This appears to bad, compute shaders hang without it. Signed-off-by: Dave Airlie <[email protected]>
* r600/ssbo: refactor out buffer coord calcs and use for atomic path.Dave Airlie2017-12-051-34/+37
| | | | | | | | The atomic rat path has a bug in the ssbo path, refactor out the address calcs from the load/store paths and reuse to fix the bug in the buffer rat atomic path. Signed-off-by: Dave Airlie <[email protected]>
* r600/ssbo: fix multi-dword buffer loads.Dave Airlie2017-12-051-5/+7
| | | | | | This fixes loading from different channels. Signed-off-by: Dave Airlie <[email protected]>
* r600/ssbo: use r32ui format for ssbo resources.Dave Airlie2017-12-051-3/+3
| | | | | | | This works best for returning the correct values and sizes in tests. Signed-off-by: Dave Airlie <[email protected]>
* r600: refactor out the immediate setup code.Dave Airlie2017-12-051-38/+28
| | | | | | This just refactors the same code out of the images/buffers paths. Signed-off-by: Dave Airlie <[email protected]>
* r600/shader: fix ssbo atomic operations formats.Dave Airlie2017-12-051-4/+12
| | | | | | Don't try and use the image format for ssbo, just 32-bit uint. Signed-off-by: Dave Airlie <[email protected]>
* r600/shader: fix thread id loading.Dave Airlie2017-12-051-9/+18
| | | | | | | This just changes how thread id loading is done, it makes smaller shaders if we don't use thread id gprs. Signed-off-by: Dave Airlie <[email protected]>
* gallium/u_upload_mgr: allow drivers to specify pipe_resource::flagsMarek Olšák2017-12-051-2/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: move setting VRAM|GTT into winsysesMarek Olšák2017-12-051-14/+0
| | | | | | The combined VRAM|GTT heap will be removed. Reviewed-by: Nicolai Hähnle <[email protected]>
* r600/atomic: add cayman version of atomic save/restore from GDS (v2)Dave Airlie2017-12-052-24/+126
| | | | | | | | | | | | | On Cayman we don't use the append/consume counters (fglrx doesn't) and they don't seem to work well with compute shaders. This just uses GDS instead to do the atomic operations. v1.1: remove unused line. v2: use EOS on cayman, it appears to work. Acked-by: Nicolai Hähnle <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/atomic: refactor out evergreen atomic setup/save code.Dave Airlie2017-12-051-30/+50
| | | | | | For cayman we want to use different code paths. Signed-off-by: Dave Airlie <[email protected]>
* meson: define driver dependenciesDylan Baker2017-12-041-0/+5
| | | | | | | | | | | | This allow us to encapsulate the compiler and linkage requirements of each driver in a reusable way. The result will be that each target that needs a specific driver can simply add `driver_<name>` to its dependencies line and the necessary libraries and compiler args will be added. This will allow for a lot of code de-duplication between gallium targets. Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>