summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/r600
Commit message (Collapse)AuthorAgeFilesLines
* r600: set the number type correctly for float rts in cb setupRoland Scheidegger2017-11-152-2/+15
| | | | | | | | | | | | Float rts were always set as unorm instead of float. Not sure of the consequences, but at least it looks like the blend clamp would have been enabled, which is against the rules (only eg really bothered to even attempt to specify this correctly, r600 always used clamp anyway). Albeit r600 (not r700) setup still looks bugged to me due to never setting BLEND_FLOAT32 which must be set according to docs... Not sure if the hw really cares, no piglit change (on eg/juniper). Reviewed-by: Dave Airlie <[email protected]>
* r600: use ieee version of rsqRoland Scheidegger2017-11-151-5/+1
| | | | | | | | | | | | Both r600 and evergreen used the clamped version, whereas cayman used the ieee one. I don't think there's a valid reason for this discrepancy, so let's switch to the ieee version for r600 and evergreen too, since we generally want to stick to ieee arithmetic. With this, behavior for both rcp and rsq should now be the same for all of r600, eg, cm, all using ieee versions (albeit note rsq retains the abs behavior for everybody, which may not be a good idea ultimately). Reviewed-by: Dave Airlie <[email protected]>
* r600: use ieee version of rcpRoland Scheidegger2017-11-151-6/+2
| | | | | | | | | | | | | r600 used the clamped version for rcp, whereas both evergreen and cayman used the ieee version. I don't know why that discrepancy exists (it does so since day 1) but there does not seem to be a valid reason for this, so make it consistent. This seems now safer than before the previous commit (using the dx10 clamp bit). Note that rsq still uses clamped version (as before even though the table may have suggested otherwise for evergreen) for r600/eg, but not for cayman. Will be changed separately for better regression tracking... Reviewed-by: Dave Airlie <[email protected]>
* r600: use DX10_CLAMP bit in shader setupRoland Scheidegger2017-11-152-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The docs are not very concise in what this really does, however both Alex Deucher and Nicolai Hähnle suggested this only really affects instructions using the CLAMP output modifier, and I've confirmed that with the newly changed piglit isinf_and_isnan test. So, with this bit set, if an instruction has the CLAMP modifier bit (which clamps to [0,1]) set, then NaNs will be converted to zero, otherwise the result will be NaN. D3D10 would require this, glsl doesn't have modifiers (with mesa clamp(x,0,1) would get converted to such a modifier) coupled with a whatever-floats-your-boat specified NaN behavior, but the clamp behavior should probably always be used (this also matches what a decomposition into min(1.0, max(x, 0.0)) would do, if min/max also adhere to the ieee spec of picking the non-nan result). Some apps may in fact rely on this, as this prevents misrenderings in This War of Mine since using ieee muls (ce7a045feeef8cad155f1c9aa07f166e146e3d00), without having to use clamped rcp opcode, which would also fix this bug there. radeonsi also seems to set this bit nowadays if I see that righ (albeit the llvm amdgpu code comment now says "Make clamp modifier on NaN input returns 0" instead of "Do not clamp NAN to 0" since it was changed, which also looks a bit misleading). v2: set it in all shader stages. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103544 Reviewed-by: Dave Airlie <[email protected]>
* r600: use min_dx10/max_dx10 instead of min/maxRoland Scheidegger2017-11-152-6/+9
| | | | | | | | | | | | | | I believe this is the safe thing to do, especially ever since the driver actually generates NaNs for muls too. The ISA docs are not very helpful here, however the dx10 versions will pick a non-nan result over a NaN one (this is also the ieee754 behavior), whereas the non-dx10 ones will pick the NaN (verified by newly changed piglit isinf-and-isnan test). Other "modern" drivers will most likely do the same. This was shown to make some difference for bug 103544, albeit it is not required to fix it. Reviewed-by: Dave Airlie <[email protected]>
* r600: fix cubemap arraysDave Airlie2017-11-151-9/+17
| | | | | | | | | | | | | | | | A lot of cubemap array piglits fail, port the texture type picking code from radeonsi which seems to fix most of them. For images I will port the rest of the code. Fixes: getteximage-depth gl_texture_cube_map_array-* fbo-generatemipmap-cubemap array getteximage-targets cube_array amongst others. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/shader: handle bitfield extract semantics properly.Dave Airlie2017-11-141-4/+53
| | | | | | | Fixes: tests/spec/arb_gpu_shader5/execution/built-in-functions/fs-bitfieldExtract.shader_test Signed-off-by: Dave Airlie <[email protected]>
* r600: handle bitfieldInsert corner case.Dave Airlie2017-11-141-1/+39
| | | | | | | | | This handles the bits >= 32 corner case in bitfieldInsert. Fixes: tests/spec/arb_gpu_shader5/execution/built-in-functions/fs-bitfieldInsert.shader_test. Signed-off-by: Dave Airlie <[email protected]>
* r600: add gs tri strip adjacency fix.Dave Airlie2017-11-144-5/+62
| | | | | | | | | | | | | | Like radeonsi: generate GS prolog to (partially) fix triangle strip adjacency rotation evergreen hw suffers from the same problem, so rotate the geometry inputs to fix this. This fixes: ./bin/glsl-1.50-geometry-primitive-types GL_TRIANGLE_STRIP_ADJACENCY on evergreen. Signed-off-by: Dave Airlie <[email protected]>
* r600: fix isoline tess factor component swapping.Dave Airlie2017-11-141-0/+7
| | | | | | | | | As per radeonsi, the tess factor components for isolines are reversed. Fixes: tests/spec/arb_tessellation_shader/execution/isoline.shader_test Cc: <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/shader: reserve first register of vertex shader.Dave Airlie2017-11-141-2/+4
| | | | | | | | | | r0 in input into vertex shaders contains things like vertexid, we need to reserve it even if we have no inputs. This fixes a bunch of tessellation piglits. Cc: <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: don't emit atomic save if we have no atomic counters.Dave Airlie2017-11-141-0/+3
| | | | | | | Otherwise we end up emitting the fence. Tested-By: Gert Wollny <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: add support for hw atomic counters. (v3)Dave Airlie2017-11-107-22/+480
| | | | | | | | | | | | | | | | This adds support for the evergreen/cayman atomic counters. These are implemented using GDS append/consume counters. The values for each counter are loaded before drawing and saved after each draw using special CP packets. v2: move hw atomic assignment into driver. v3: fix messing up caps (Gert Wollny), only store ranges in driver, drop buffers. Signed-off-by: Dave Airlie <[email protected]> Acked-by: Nicolai Hähnle <[email protected]> Tested-By: Gert Wollny <[email protected]>
* gallium: add CAPs to support HW atomic counters. (v3)Dave Airlie2017-11-101-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | This looks like an evergreen specific feature, but with atomic counters AMD have hw specific counters they use instead of operating on buffers directly. These are separate to the buffer atomics, so require different limits and code paths. I've left the CAP for atomic type extensible in case someone else has a variant on this sort of thing (freedreno maybe?) and needs to change it. This adds all the CAPs required to add support for those atomic counters, along with a related CAP for limiting the number of output resources. I'd like to land this and the st patch then I can start to upstream the evergreen support for these and other GL4.x features. v2: drop the ATOMIC_COUNTER_MODE cap, just use the return from the HW counters. If 0 we use the current mode. v3: fix some rebase errors (Gert Wollny) Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Tested-By: Gert Wollny <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/query: drop rest of vi workaround code.Dave Airlie2017-11-102-37/+13
| | | | | | | | This isn't needed in r600 anymore. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* util: move os_time.[ch] to src/utilNicolai Hähnle2017-11-096-6/+6
| | | | Reviewed-by: Marek Olšák <[email protected]>
* r600g: use SIMPLE_FLOAT for blending to enable some optimizationsIlia Mirkin2017-11-082-0/+2
| | | | | | | | | | | Radeonsi also sets this flag. Seems to avoid pulling up the desintation RT value when the dst blend factor is zero if it's not otherwise being loaded. Among other things, it allows blending to overwrite infinity/NaN values in the destination RT. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: add PIPE_CAP_SIGNED_VERTEX_BUFFER_OFFSETMarek Olšák2017-11-061-0/+1
|
* r600: add support for early depth/stencil.Dave Airlie2017-11-031-0/+3
| | | | | | | | This add support for the early depth/stencil property found on image shaders. Reviewed-by: Nicolai Hähnle <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: add support for emitting RAT instructions to the assembler.Dave Airlie2017-11-033-0/+35
| | | | | | | | This adds support for emitting RAT instructions to the assembler. RAT instructions are used to implement image accessors. Reviewed-by: Nicolai Hähnle <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: add support for mark bit to the assembler.Dave Airlie2017-11-033-0/+7
| | | | | | | | This adds support to the assembler for the mark bit on the export word1. Reviewed-by: Nicolai Hähnle <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: add support for valid pixel mode on CF clausesDave Airlie2017-11-032-0/+2
| | | | | | | | This just adds support to the assembler for setting the valid pixel mode on the CF clause. Reviewed-by: Nicolai Hähnle <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: add support for some ALU sources.Dave Airlie2017-11-031-0/+9
| | | | | | | | | | These special ALU sources provide the shader engine, simd and hw wave ids. These are required for images support. Reviewed-by: Nicolai Hähnle <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* gallium: add cap for driver specified max combined shader resources.Dave Airlie2017-11-011-0/+1
| | | | | | | | Some hw (evergreen) has a limit on how many combined (images/buffers/mrts) a fragment shader can access. Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: bail out if prepare_alu_group() doesn't find a proper schedulingGert Wollny2017-11-012-20/+31
| | | | | | | | | | | | | | | It is possible that the optimizer ends up in an infinite loop in post_scheduler::schedule_alu(), because post_scheduler::prepare_alu_group() does not find a proper scheduling. This can be deducted from pending.count() being larger than zero and not getting smaller. This patch works around this problem by signalling this failure so that the optimizers bails out and the un-optimized shader is used. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103142 Cc: <[email protected]> Signed-off-by: Gert Wollny <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* Android: fix build break from r600/radeon splitRob Herring2017-10-101-0/+4
| | | | | | | | | | | | | | | Commit 06bfb2d28f7a ("r600: fork and import gallium/radeon") broke the Android build: external/mesa3d/src/gallium/drivers/radeon/r600_pipe_common.c:43:10: fatal error: 'llvm-c/TargetMachine.h' file not found ^~~~~~~~~~~~~~~~~~~~~~~~ Update the Android makefiles so that drivers/radeon is only built when radeonsi (and therefore LLVM) is enabled. Fixes: 06bfb2d28f7a (r600: fork and import gallium/radeon) Acked-by: Marek Olšák <[email protected]> Signed-off-by: Rob Herring <[email protected]>
* r600: cleanup llvm ir target selection.Dave Airlie2017-10-111-18/+2
| | | | | | | Only r600 target used now for compute IR. Reviewed-by: Nicolai Hähnle <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: drop tc_L2_dirty bit, this was SI only.Dave Airlie2017-10-113-15/+0
| | | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* gallium: Create a new PIPE_CAP_TILE_RASTER_ORDER for vc4.Eric Anholt2017-10-101-0/+1
| | | | | | | | | | | | | | | | Because vc4 can control the order that tiles are rasterized in, we can use it to implement overlapping blits using normal drawing and GL_ARB_texture_barrier, as long as we can tell the kernel what order to render the tiles in. This commit introduces the core gallium support, vc4 changes will follow. v2: Fix on the simulator. v3: Add the cap (disabled) to other drivers, add rst docs for the cap. v4: Rebase on PIPE_CAP_TGSI_ANY_REG_AS_ADDRESS v5: Drop vc4 changes from this commit, for clarity. Reviewed-by: Nicolai Hähnle <[email protected]> (v3)
* r600: drop a bunch of post-cayman code. (v2)Dave Airlie2017-10-1012-1251/+199
| | | | | | | | | | | | | Now that Marek has split the two drivers apart, drop a bunch of unnecessary code from the r600 half. There is probably a bunch more hiding in the video code. No piglit regressions on caicos. v2: fix HAVE_LLVM protected code Acked-by: Nicolai Hähnle <[email protected]> Acked-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* amd: move r600d_common.h into r600gMarek Olšák2017-10-094-2/+139
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: shrink r600d_common.h and stop using itMarek Olšák2017-10-095-8/+144
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/u_blitter: let drivers decide which VS to use for draw_rectangleMarek Olšák2017-10-072-0/+3
| | | | | | | This approach allows drivers to set their own vertex shader and skip compilation of u_blitter vertex shaders. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/u_blitter: let drivers set the vertex elements stateMarek Olšák2017-10-072-0/+4
| | | | | | radeonsi won't set it. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: add PIPE_CAP_TGSI_ANY_REG_AS_ADDRESSMarek Olšák2017-10-061-0/+1
| | | | | Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: Remove util_format_s3tc_init()Matt Turner2017-10-021-1/+0
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* gallium: Remove util_format_s3tc_enabledMatt Turner2017-10-021-4/+0
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* gallium: add LDEXP TGSI instruction and corresponding capNicolai Hähnle2017-09-291-0/+1
| | | | | Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* tgsi: clarify the semantics of DFRACEXPNicolai Hähnle2017-09-291-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The status quo is quite the mess: 1. tgsi_exec will do a per-channel computation, and store the dst[0] result (significand) correctly for each channel. The dst[1] result (exponent) will be written to the first bit set in the writemask. So per-component calculation only works partially. 2. r600 will only do a single computation. It will replicate the exponent but not the significand. 3. The docs pretend that there's per-component calculation, but even get dst[0] and dst[1] confused. 4. Luckily, st_glsl_to_tgsi only ever emits single-component instructions, and kind-of assumes that everything is replicated, generating this for the dvec4 case: DFRACEXP TEMP[0].xy, TEMP[1].x, CONST[0][0].xyxy DFRACEXP TEMP[0].zw, TEMP[1].y, CONST[0][0].zwzw DFRACEXP TEMP[2].xy, TEMP[1].z, CONST[0][1].xyxy DFRACEXP TEMP[2].zw, TEMP[1].w, CONST[0][1].zwzw Settle on the simplest behavior, which is single-component calculation with replication, document it, and adjust tgsi_exec and r600. Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* r600: cleanup set_occlusion_query_stateNicolai Hähnle2017-09-293-14/+3
| | | | | | | | | | | This fixes a warning caused by the fork (note the change in the function signature): ../../../../../mesa-src/src/gallium/drivers/r600/r600_state_common.c: In function ‘r600_init_common_state_functions’: ../../../../../mesa-src/src/gallium/drivers/r600/r600_state_common.c:2974:36: warning: assignment from incompatible pointer type [-Wincompatible-pointer-types] rctx->b.set_occlusion_query_state = r600_set_occlusion_query_state; Reviewed-by: Marek Olšák <[email protected]>
* r600: fork and import gallium/radeonMarek Olšák2017-09-2625-12/+14508
| | | | | | | | | | | This marks the end of code sharing between r600 and radeonsi. It's getting difficult to work on radeonsi without breaking r600. A lot of functions had to be renamed to prevent linker conflicts. There are also minor cleanups. Acked-by: Dave Airlie <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: Add PIPE_SHADER_CAP_INT64_ATOMICSJan Vesely2017-09-211-0/+1
| | | | | | | Denotes availability of 64bit int atomic instructions Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: Add PIPE_SHADER_CAP_FP16Jan Vesely2017-09-181-0/+1
| | | | | | | | | Denotes native half precision float operations capability v2: PIPE_CAP_HALFS -> PIPE_SHADER_CAP_FP16 fix indentation Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: pass old_(perfect_)enable to set_occlusion_query_stateNicolai Hähnle2017-09-181-1/+3
| | | | | | | The callee can derive the current enable state itself. Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* r600: add .gitignore for egd_tables.hDave Airlie2017-09-151-0/+1
|
* gallium: introduce PIPE_CAP_LOAD_CONSTBUFTimothy Arceri2017-09-151-0/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/{r600, radeonsi}: Fix segfault with color format (v2)Denis Pauk2017-09-141-0/+4
| | | | | | | | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102552 v2: Patch cleanup proposed by Nicolai Hähnle. * deleted changes in si_translate_texformat. Cc: Nicolai Hähnle <[email protected]> Cc: Ilia Mirkin <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* ac/surface: add radeon_surf::has_stencil for convenienceMarek Olšák2017-09-073-3/+3
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: sort DBG shader flags according to pipe_shader_typeMarek Olšák2017-09-041-1/+1
| | | | | Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: remove TGSI opcode SCSMarek Olšák2017-08-221-124/+3
| | | | | | | use COS+SIN instead. Reviewed-by: Roland Scheidegger <[email protected]> Acked-by: Jose Fonseca <[email protected]>