summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/r600
Commit message (Collapse)AuthorAgeFilesLines
* gallium: add separate PIPE_CAP_INT64_DIVMODIlia Mirkin2017-02-091-0/+1
| | | | | | | | | | | Nouveau does not currently have logic to implement this as a library function. Even though such a library could be written, there's no big advantage to do it that way for now given that int64 is a very uncommon use-case. Allow a driver to expose INT64 without supporting division and modulo operations. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* r600/sb: Fix memory leakBartosz Tomczyk2017-02-081-1/+7
| | | | Signed-off-by: Marek Olšák <[email protected]>
* gallium: turn PIPE_SHADER_CAP_DOUBLES into a screen capabilityNicolai Hähnle2017-02-021-1/+8
| | | | | | | | | | | | | | | | | | | Make the cap consistent with PIPE_CAP_INT64. Aside from the hypothetical case of using draw for vertex shaders (and actually caring about doubles...), every implementation supports doubles either nowhere or everywhere. Also, st/mesa didn't even check the cap correctly in all supported shader stages. While at it, add a missing LLVM version check for 64-bit integers in radeonsi. This is conservative: judging by the log, LLVM 3.8 might be sufficient, but there are probably bugs that have been fixed since then. v2: fix clover (Marek) Reviewed-by: Marek Olšák <[email protected]>
* r600: fix a compilation warning in r600_screen_create()Samuel Pitoiset2017-01-301-1/+1
| | | | | | | | Should be r600_common_screen instead of r600_screen. Fixes: 80157a2c20 ("gallium/radeon: clean up r600_query_init_backend_mask") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: merge dirty_fb_counter and dirty_tex_descriptor_counterMarek Olšák2017-01-301-4/+4
| | | | | | to simplify things in draw_vbo a little Reviewed-by: Nicolai Hähnle <[email protected]>
* r600: Fix stack overflowBartosz Tomczyk2017-01-301-1/+1
| | | | | | | | | Commit 7b5878ee0491e7a93914389a8369cd6752b9757d increased number of outputs to 64, but left output array intact. This caused stack overflow when number of outputs is bigger then 32. Found by ASAN. Cc: "12.0 13.0 17.0" <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: clean up r600_query_init_backend_maskMarek Olšák2017-01-301-1/+1
| | | | | | | This just needs to be done for r600g in the screen. We don't need an IB submission for every new context created for GCN. Reviewed-by: Nicolai Hähnle <[email protected]>
* r600g: use ieee variants of multiplication instructionsIlia Mirkin2017-01-292-18/+19
| | | | | | | | This matches the behavior of most other drivers, including nouveau, radeonsi, and i965. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* r600g: add support for optionally using non-IEEE mul opsIlia Mirkin2017-01-282-4/+18
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: Add integer 64 capabilityDave Airlie2017-01-271-0/+1
| | | | | | | | | v1.1: move to using a normal CAP. (Marek) v2: fill in the cap everywhere Signed-off-by: Dave Airlie <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: add PIPE_CAP_TGSI_MUL_ZERO_WINSIlia Mirkin2017-01-231-0/+1
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Axel Davy <[email protected]>
* r600: implement DDIVNicolai Hähnle2017-01-231-0/+59
| | | | | | Tested-by: Glenn Kennard <[email protected]> Tested-by: James Harvey <[email protected]> Cc: 17.0 <[email protected]>
* r600: factor out cayman_emit_unary_double_rawNicolai Hähnle2017-01-231-20/+42
| | | | | | | | We will use it for DDIV. Tested-by: Glenn Kennard <[email protected]> Tested-by: James Harvey <[email protected]> Cc: 17.0 <[email protected]>
* r600: double multiply can handle only one multiply at a timeNicolai Hähnle2017-01-231-17/+19
| | | | | | | | | | It seems clear that trying to multiply two pairs of doubles would result in the temporary register getting overwritten by the second pair. So make the code more explicit. Tested-by: Glenn Kennard <[email protected]> Tested-by: James Harvey <[email protected]> Cc: 17.0 <[email protected]>
* gallium: add flags parameter to texture barrierIlia Mirkin2017-01-161-1/+1
| | | | | | | | This is so that we can differentiate between flushing any framebuffer reading caches from regular sampler caches. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: add PIPE_CAP_TGSI_FS_FBFETCHIlia Mirkin2017-01-161-1/+2
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: prevent SDMA stalls by detecting RAW hazards in need_dma_spaceMarek Olšák2017-01-054-4/+0
| | | | | | | | Call r600_dma_emit_wait_idle only when there is a possibility of a read-after-write hazard. Buffers not yet used by the SDMA IB don't have to wait. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: remove TGSI_OPCODE_SUBMarek Olšák2017-01-051-14/+0
| | | | | | It's redundant with the source modifier. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: remove TGSI_OPCODE_ABSMarek Olšák2017-01-051-6/+3
| | | | | | It's redundant with the source modifier. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: add PIPE_CAP_GLSL_OPTIMIZE_CONSERVATIVELYMarek Olšák2017-01-051-0/+1
| | | | | | Drivers with good compilers don't need aggressive optimizations before TGSI. Reviewed-by: Eric Anholt <[email protected]>
* r600/sb: Fix loop optimization related hangs on egHeiko Przybyl2017-01-036-30/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure unused ops and their references are removed, prior to entering the GCM (global code motion) pass, to stop GCM from breaking the loop logic and thus hanging the GPU. Turns out, that sb has problems with loops and node optimizations regarding associative folding: - the global code motion (gcm) pass moves ops up a loop level/basic block until they've fulfilled their total usage count - if there are ops folded into others, the usage count won't be fulfilled and thus the op moved way up to the top - within GCM the op would be visited and their deps would be moved alongside it, to fulfill the src constaints - in a loop, an unused op is moved out of the loop and GCM would move the src value ops up as well - now here arises the problem: if the loop counter is one of the src values it would get moved up as well, the loop break condition would never get hit and the shader turn into an endless loop, resulting in the GPU hanging and being reset A reduced (albeit nonsense) piglit example would be: [require] GLSL >= 1.20 [fragment shader] uniform int SIZE; uniform vec4 lights[512]; void main() { float x = 0; for(int i = 0; i < SIZE; i++) x += lights[2*i+1].x; } [test] uniform int SIZE 1 draw rect -1 -1 2 2 Which gets optimized to: ===== SHADER #12 OPT ================================== PS/BARTS/EVERGREEN ===== ===== 42 dw ===== 1 gprs ===== 2 stack ========================================= ALU 3 @24 1 y: MOV R0.y, 0 t: MULLO_UINT R0.w, [0x00000002 2.8026e-45].x, R0.z LOOP_START_DX10 @22 PUSH @6 ALU 1 @30 KC0[CB0:0-15] 2 M x: PRED_SETGE_INT __.x, R0.z, KC0[0].x JUMP @14 POP:1 LOOP_BREAK @20 POP @14 POP:1 ALU 2 @32 3 x: ADD_INT R0.x, R0.w, [0x00000002 2.8026e-45].x TEX 1 @36 VFETCH R0.x___, R0.x, RID:0 MFC:16 UCF:0 FMT[..] ALU 1 @40 4 y: ADD R0.y, R0.y, R0.x LOOP_END @4 EXPORT_DONE PIXEL 0 R0.____ EOP ===== SHADER_END =============================================================== Notice R0.z being the loop counter/break condition relevant register and being never incremented at all. Also some of the loop content has been moved out of it, to fulfill the requirements for the one unused op. With a debug build of mesa this would produce an error like error at : PRED_SETGE_INT __, __, EM.2, R1.x.2||[email protected], C0.x : operand value R1.x.2||[email protected] was not previously written to its gpr and the compilation would fail due to this. On a release build it gets passed to the GPU. When using this patch, the loop remains intact: ===== SHADER #12 OPT ================================== PS/BARTS/EVERGREEN ===== ===== 48 dw ===== 1 gprs ===== 2 stack ========================================= ALU 2 @24 1 y: MOV R0.y, 0 z: MOV R0.z, 0 LOOP_START_DX10 @22 PUSH @6 ALU 1 @28 KC0[CB0:0-15] 2 M x: PRED_SETGE_INT __.x, R0.z, KC0[0].x JUMP @14 POP:1 LOOP_BREAK @20 POP @14 POP:1 ALU 4 @30 3 t: MULLO_UINT T0.x, [0x00000002 2.8026e-45].x, R0.z 4 x: ADD_INT R0.x, T0.x, [0x00000002 2.8026e-45].x TEX 1 @40 VFETCH R0.x___, R0.x, RID:0 MFC:16 UCF:0 FMT[..] ALU 2 @44 5 y: ADD R0.y, R0.y, R0.x z: ADD_INT R0.z, R0.z, 1 LOOP_END @4 EXPORT_DONE PIXEL 0 R0.____ EOP ===== SHADER_END =============================================================== Piglit: ./piglit summary console -d results/*_gpu_noglx name: unpatched_gpu_noglx patched_gpu_noglx ---- ------------------- ----------------- pass: 18016 18021 fail: 748 743 crash: 7 7 skip: 1124 1124 timeout: 0 0 warn: 13 13 incomplete: 0 0 dmesg-warn: 0 0 dmesg-fail: 0 0 changes: 0 5 fixes: 0 5 regressions: 0 0 total: 19908 19908 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94900 Tested-by: Heiko Przybyl <[email protected]> Tested-on: Barts PRO HD6850 Signed-off-by: Heiko Przybyl <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* gallium: support for native fence fd'sRob Clark2016-12-011-0/+1
| | | | | | | This enables gallium support for EGL_ANDROID_native_fence_sync, for drivers which support PIPE_CAP_NATIVE_FENCE_FD. Signed-off-by: Rob Clark <[email protected]>
* gallium: add PIPE_CAP_TGSI_CAN_READ_OUTPUTSNicolai Hähnle2016-11-301-0/+1
| | | | | | | | | | | Drivers that support this benefit by saving one lowering pass in the GLSL-to-TGSI conversion. radeonsi already supports this because all outputs are stored in temporary variables before the export (except for TCS outputs, which have always been readable in TGSI anyway due to their special semantics). Reviewed-by: Marek Olšák <[email protected]>
* gallium: add PIPE_SHADER_CAP_LOWER_IF_THRESHOLDMarek Olšák2016-11-151-0/+1
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: remove radeon_surf_level::pitch_bytesMarek Olšák2016-11-012-4/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: stop using PIPE_BIND_CUSTOMMarek Olšák2016-10-263-7/+5
| | | | | | it has no effect whatsoever Reviewed-by: Nicolai Hähnle <[email protected]>
* r600g: remove a redundant buffer_create helperMarek Olšák2016-10-261-23/+8
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: don't do (fmask.size && cmask.size)Marek Olšák2016-10-262-2/+2
| | | | | | fmask implies that cmask is present too. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: remove unnecessary fields from radeon_surf_levelMarek Olšák2016-10-262-8/+8
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: stop using some input fields from radeon_surfaceMarek Olšák2016-10-262-2/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: add PIPE_CAP_STREAM_OUTPUT_INTERLEAVE_BUFFERSIlia Mirkin2016-10-221-0/+1
| | | | | | | | | | | | | | This allows the driver to signal that it can't handle random interleaving of attributes across buffers. This is required for ARB_transform_feedback3, and it's initialized to whatever the previous value of PIPE_CAP_STREAM_OUTPUT_PAUSE_RESUME was except for nv50 where it is disabled. Note that the proprietary drivers never expose ARB_transform_feedback3 on any GT21x's (where nouveau previously did), and after some effort I was unable to get it to work. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: import all TGSI->LLVM code from gallium/radeonMarek Olšák2016-10-181-1/+0
| | | | | | Acked-by: Nicolai Hähnle <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Acked-by: Edward O'Callaghan <[email protected]>
* gallium: add PIPE_CAP_TGSI_ARRAY_COMPONENTSNicolai Hähnle2016-10-121-0/+1
| | | | | | | | This is a screen cap because drivers are expected to support it either for all shader types or for none of them. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* gallium/radeon: implement set_device_reset_callbackNicolai Hähnle2016-10-051-0/+3
| | | | | | | | | Check for device reset on flush. It would be nicer if the kernel just reported this as an error on the submit ioctl (and similarly for fences), but this will do for now. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: move r600_common_context::texture_buffers to r600gMarek Olšák2016-10-044-2/+8
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* gallium/radeon: add barrier_flags to r600_common_screenNicolai Hähnle2016-09-291-0/+6
| | | | | | | | | | | There are driver-specific context flags for barriers that are not covered by the Gallium barrier interfaces. The R600 settings of these flags may not be optimal, but we're not going to use them yet anyway. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* r600g: Add support for PK2H/UP2HGlenn Kennard2016-09-262-5/+104
| | | | | | | | Based off of Ilia's original patch, but with output values replicated so that it matches the TGSI semantics. Signed-off-by: Glenn Kennard <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* r600g/sb: fix struct/class declaration conflictsMartina Kollarova2016-09-181-5/+1
| | | | | | | | | A couple of forward-declarations were causing warnings in clang: 'value' defined as a class here but previously declared as a struct [-Wmismatched-tags] Signed-off-by: Martina Kollarova <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* android: add support for libmesa_amdgpu_addrlibMauro Rossi2016-09-131-0/+2
| | | | | | | | | | | Android porting of the following commits: f1f1ba3 "radeonsi: move sid.h/r600d_common.h to a common place." 69fca64 "amd/addrlib: move addrlib from amdgpu winsys to common code" This patch fixes android building errors Reviewed-by: Dave Airlie <[email protected]>
* gallium: remove PIPE_BIND_TRANSFER_READ/WRITEMarek Olšák2016-09-082-10/+0
| | | | | | | | not used in any useful way Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* radeonsi: move sid.h/r600d_common.h to a common place.Dave Airlie2016-09-061-2/+4
| | | | | | | | | | Step one to merging radv would be to move some files around. This only adds the include path to r600/radeonsi, because later we want to avoid having to add it to the generic target paths. Acked-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: remove VPORT_ZMIN/ZMAX from init config statesMarek Olšák2016-09-052-19/+1
| | | | | | | It's part of the viewport state now. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: set VPORT_ZMIN/MAX registers correctlyMarek Olšák2016-09-056-3/+7
| | | | | | | | | | | | Calculate depth ranges from viewport states and pipe_rasterizer_state::clip_halfz. The evergreend.h change is required to silence a warning. This fixes this recently updated piglit: arb_depth_clamp/depth-clamp-range Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: derive buffer placement and flags only at initializationMarek Olšák2016-09-051-3/+2
| | | | | | | | | | Invalidated buffers don't have to go through it. Split r600_init_resource into r600_init_resource_fields and r600_alloc_resource. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* Introduce .editorconfigEric Engestrom2016-08-311-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | A few weeks ago, Jose Fonseca suggested [0] we use .editorconfig files to try and enforce the formatting of the code, to which Michel Dänzer suggested [1] we start by importing the existing .dir-locals.el settings. The first draft was discussed in the RFC [2]. These .editorconfig are a first step, one that has the advantage of requiring little to no intervention from the devs once the settings files are in place, but the settings are very limited. This does have the advantage of applying while the code is being written. This doesn't replace the need for more comprehensive formatting tools such as clang-format & clang-tidy, but those reformat the code after the fact. [0] https://lists.freedesktop.org/archives/mesa-dev/2016-June/121545.html [1] https://lists.freedesktop.org/archives/mesa-dev/2016-June/121639.html [2] https://lists.freedesktop.org/archives/mesa-dev/2016-July/123431.html Acked-by: Nicolai Hähnle <[email protected]> Acked-by: Eric Anholt <[email protected]> Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* r600g: Clean up defined magic numbers for TGSI opcodesRhys Kidd2016-08-291-7/+7
| | | | | | | | | | | | Small code clean up that removes magic numbers where a TGSI opcode has been defined. No functional change expected as each opcode is unsupported on the respective hardware. Signed-off-by: Rhys Kidd <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Tested-by: James Harvey <[email protected]>
* r600g: Avoid duplicated initialization of TGSI_OPCODE_DFMARhys Kidd2016-08-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As reported by Clang, TGSI_OPCODE_DFMA (defined magic number 118) is currently initialized twice for Cayman and Evergreen. When Jan Vesely added double precision FMA opcode it did make sense to locate it immediately after TGSI_OPCODE_DMAD, although this is out of order. This change cleans up the prior magic number definition and ensures any later reordering of this struct will not create problems. Prior change was: commit 015e2e0fce3eea7884f8df275c2fadc35143a324 Author: Jan Vesely <[email protected]> Date: Sat Jul 2 16:14:54 2016 -0400 r600g: Add double precision FMA ops Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96782 Fixes: 54c4d525da7c7fc1e103d7a3e6db015abb132d5d ("r600g: Enable FMA on chips that support it") Signed-off-by: Jan Vesely <[email protected]> Tested-by: James Harvey <[email protected]> Signed-off-by: Marek Olšák <[email protected]> Signed-off-by: Rhys Kidd <[email protected]> Reviewed-by: Jan Vesely <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Tested-by: James Harvey <[email protected]>
* gallium: Use enum pipe_shader_type in set_sampler_views()Kai Wasserbäch2016-08-291-1/+2
| | | | | Signed-off-by: Kai Wasserbäch <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallium: Use enum pipe_shader_type in bind_sampler_states() (v2)Kai Wasserbäch2016-08-291-1/+1
| | | | | | | | | | | v1 → v2: - Fixed indentation (noted by Brian Paul) - Removed second assert from nouveau's switch statements (suggested by Brian Paul) Signed-off-by: Kai Wasserbäch <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* r600: increase performance for DRI PRIME offloading if 2nd GPU is Evergreen+Mario Kleiner2016-08-261-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a direct port of Marek Olšáks patch "radeonsi: increase performance for DRI PRIME offloading if 2nd GPU is CIK or VI" to r600. It uses SDMA for the detiling blit from renderoffload VRAM to GTT, as SDMA is much faster for tiled->linear blits from VRAM to GTT. Testing on a dual Radeon HD-5770 setup reduced the time for the render offload gpu to get its rendering into system RAM from approximately 16 msecs for simple rendering at 1920x1080 pixel 32 bpp to 5 msecs, a > 3x speedup! This was measured using ftrace to trace the time the radeon kms driver waited on the dmabuf fence of the renderoffload gpu to complete. All in all this brought the time for a flip down from 20 msecs to 9 msecs, so the prime setup can display at full 60 fps instead of barely 30 fps vsync'ed. The current r600 implementation supports SDMA on Evergreen and later, but not R600/R700 due to some bugs apparently present in their SDMA implementation. Signed-off-by: Mario Kleiner <[email protected]> Cc: Marek Olšák <[email protected]> Signed-off-by: Marek Olšák <[email protected]>