aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* gallium: remove TGSI_OPCODE_ABSMarek Olšák2017-01-0512-62/+3
| | | | | | It's redundant with the source modifier. Reviewed-by: Nicolai Hähnle <[email protected]>
* swr: remove unneeded llvm version checkTim Rowley2017-01-051-4/+0
| | | | | | | Old test caused breakage with llvm-svn (4.0.0svn), and not needed as the minimum required llvm version for swr is 3.6. Reviewed-by: George Kyriazis <[email protected]>
* swr: fix windows build breakGeorge Kyriazis2017-01-051-4/+0
| | | | | | | | | | wrap lp_bld_type.h around extern "C". Windows decorates global variables, so when used from .cpp files, need to use an undecorated version. Also, removed related and unneeded code from swr_screen.cpp Reviewed-by: Ilia Mirkin <[email protected]>
* radeonsi: update clip_regs if clip_disable changes to fix a hangMarek Olšák2017-01-051-0/+5
| | | | | | | | | | | | | | This seems to fix the GPU hangs caused by: commit ed3190b3f3a776fc8c75b1e6130a88079166d115 Author: Marek Olšák <[email protected]> Date: Sun Nov 13 18:41:43 2016 +0100 radeonsi: don't export ClipVertex and ClipDistance[] if clipping is disabled Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99219 Tested-by: Samuel Pitoiset <[email protected]>
* gallium: add PIPE_CAP_GLSL_OPTIMIZE_CONSERVATIVELYMarek Olšák2017-01-0515-0/+15
| | | | | | Drivers with good compilers don't need aggressive optimizations before TGSI. Reviewed-by: Eric Anholt <[email protected]>
* radeonsi: capitalize VM hex addr when dumping buffer listSamuel Pitoiset2017-01-041-1/+1
| | | | | | | | Useful when debugging with R600_DEBUG=vm,check_vm to match addr in both outputs. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* r600/sb: Fix loop optimization related hangs on egHeiko Przybyl2017-01-036-30/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure unused ops and their references are removed, prior to entering the GCM (global code motion) pass, to stop GCM from breaking the loop logic and thus hanging the GPU. Turns out, that sb has problems with loops and node optimizations regarding associative folding: - the global code motion (gcm) pass moves ops up a loop level/basic block until they've fulfilled their total usage count - if there are ops folded into others, the usage count won't be fulfilled and thus the op moved way up to the top - within GCM the op would be visited and their deps would be moved alongside it, to fulfill the src constaints - in a loop, an unused op is moved out of the loop and GCM would move the src value ops up as well - now here arises the problem: if the loop counter is one of the src values it would get moved up as well, the loop break condition would never get hit and the shader turn into an endless loop, resulting in the GPU hanging and being reset A reduced (albeit nonsense) piglit example would be: [require] GLSL >= 1.20 [fragment shader] uniform int SIZE; uniform vec4 lights[512]; void main() { float x = 0; for(int i = 0; i < SIZE; i++) x += lights[2*i+1].x; } [test] uniform int SIZE 1 draw rect -1 -1 2 2 Which gets optimized to: ===== SHADER #12 OPT ================================== PS/BARTS/EVERGREEN ===== ===== 42 dw ===== 1 gprs ===== 2 stack ========================================= ALU 3 @24 1 y: MOV R0.y, 0 t: MULLO_UINT R0.w, [0x00000002 2.8026e-45].x, R0.z LOOP_START_DX10 @22 PUSH @6 ALU 1 @30 KC0[CB0:0-15] 2 M x: PRED_SETGE_INT __.x, R0.z, KC0[0].x JUMP @14 POP:1 LOOP_BREAK @20 POP @14 POP:1 ALU 2 @32 3 x: ADD_INT R0.x, R0.w, [0x00000002 2.8026e-45].x TEX 1 @36 VFETCH R0.x___, R0.x, RID:0 MFC:16 UCF:0 FMT[..] ALU 1 @40 4 y: ADD R0.y, R0.y, R0.x LOOP_END @4 EXPORT_DONE PIXEL 0 R0.____ EOP ===== SHADER_END =============================================================== Notice R0.z being the loop counter/break condition relevant register and being never incremented at all. Also some of the loop content has been moved out of it, to fulfill the requirements for the one unused op. With a debug build of mesa this would produce an error like error at : PRED_SETGE_INT __, __, EM.2, R1.x.2||[email protected], C0.x : operand value R1.x.2||[email protected] was not previously written to its gpr and the compilation would fail due to this. On a release build it gets passed to the GPU. When using this patch, the loop remains intact: ===== SHADER #12 OPT ================================== PS/BARTS/EVERGREEN ===== ===== 48 dw ===== 1 gprs ===== 2 stack ========================================= ALU 2 @24 1 y: MOV R0.y, 0 z: MOV R0.z, 0 LOOP_START_DX10 @22 PUSH @6 ALU 1 @28 KC0[CB0:0-15] 2 M x: PRED_SETGE_INT __.x, R0.z, KC0[0].x JUMP @14 POP:1 LOOP_BREAK @20 POP @14 POP:1 ALU 4 @30 3 t: MULLO_UINT T0.x, [0x00000002 2.8026e-45].x, R0.z 4 x: ADD_INT R0.x, T0.x, [0x00000002 2.8026e-45].x TEX 1 @40 VFETCH R0.x___, R0.x, RID:0 MFC:16 UCF:0 FMT[..] ALU 2 @44 5 y: ADD R0.y, R0.y, R0.x z: ADD_INT R0.z, R0.z, 1 LOOP_END @4 EXPORT_DONE PIXEL 0 R0.____ EOP ===== SHADER_END =============================================================== Piglit: ./piglit summary console -d results/*_gpu_noglx name: unpatched_gpu_noglx patched_gpu_noglx ---- ------------------- ----------------- pass: 18016 18021 fail: 748 743 crash: 7 7 skip: 1124 1124 timeout: 0 0 warn: 13 13 incomplete: 0 0 dmesg-warn: 0 0 dmesg-fail: 0 0 changes: 0 5 fixes: 0 5 regressions: 0 0 total: 19908 19908 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94900 Tested-by: Heiko Przybyl <[email protected]> Tested-on: Barts PRO HD6850 Signed-off-by: Heiko Przybyl <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* freedreno/ir3: rework varying slots (maybe??)Rob Clark2016-12-301-4/+9
| | | | | | | | | | See: dEQP-GLES2.functional.shaders.swizzles.vector_swizzles.mediump_vec2_yyyy_fragment if we only access (in FS) varying.y then it ends up in slot zero.. I'm not sure the hw likes that.. Signed-off-by: Rob Clark <[email protected]>
* nir: Rename convert_to_ssa lower_regs_to_ssaJason Ekstrand2016-12-292-2/+2
| | | | This matches the naming of nir_lower_vars_to_ssa, the other to-SSA pass.
* vc4: Rework scheduling of thread switch to cut one more NOP.Eric Anholt2016-12-291-46/+75
| | | | | | | | | | | | | | Jonas's patch got us most of the benefit of scheduling instructions into the delay slots of thread switch, but if there had been nothing to pair the thrsw with, it would move the thrsw up and leave a NOP where the thrsw was. Instead, don't pair anything with thrsw through the normal scheduling path, and have a separate helper function that inserts the thrsw earlier if possible and inserts any necessary NOPs. total instructions in shared programs: 93027 -> 92643 (-0.41%) instructions in affected programs: 14952 -> 14568 (-2.57%)
* vc4: Fill thread switching delay slotsJonas Pfeil2016-12-291-7/+38
| | | | | | | | | | | | | | | Scan for instructions without a signal set in front of the switching instruction and move the signal up there. shader-db results: total instructions in shared programs: 94494 -> 93027 (-1.55%) instructions in affected programs: 23545 -> 22078 (-6.23%) v2: Fix re-emitting of the instruction in the loop trying to emit NOPs, drop a scheduling change from branch delay slots. (by anholt) Signed-off-by: Jonas Pfeil <[email protected]>
* vc4: Enable NIR-based loop unrolling.Eric Anholt2016-12-291-0/+5
| | | | | This successfully unrolls a new shader in GLB2.7, which also gets that shader to successfully compile in multithreaded mode.
* freedreno/ir3: fix linkage::var sizeRob Clark2016-12-271-1/+1
| | | | | | | It should actually be 32 for a4xx/a5xx.. we still only advertise 16 but for a5xx the linkage map includes position/psize. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: treat clipvertex like a normal varyingRob Clark2016-12-271-3/+1
| | | | | | | | We need this in case it is streamed out. Not sure why we were treating it specially before. Having it as a VS out is harmless if FS doesn't have a matching input. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: transform-feedback supportRob Clark2016-12-277-38/+209
| | | | | | | | | | | We'll need to revisit when adding hw binning pass support, whether we can still do this in main draw step, as we do w/ a3xx/a4xx, or if we needed to move it to the binning stage. Still some failing piglits but most tests pass and the common cases seem to work. Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2016-12-277-43/+81
| | | | | | | Pull in a5xx streamout related regs. Also fixes a couple incorrect register definitions. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: UBO support for 64b GPUs (a5xx)Rob Clark2016-12-271-3/+24
| | | | | | Update address calculation to support 64b addresses. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: rework location of driver constantsRob Clark2016-12-276-53/+75
| | | | | | | | | | | Rework how we lay out driver constants (driver-params, UBO/TFBO buffer addresses, immediates) for more flexibility. For a5xx+ we need to deal with the fact that gpu ptrs are 64b instead of 32b, which makes the fixed offset scheme not work so well. While we are dealing with that we might also make the layout more dynamic to account for varying # of UBOs, etc. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: fix emit for bo addressesRob Clark2016-12-271-3/+9
| | | | | | Reloc for the buffer address is two dwords on 64b devices (a5xx+) Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: texture layoutRob Clark2016-12-272-2/+2
| | | | | | | Seems to be imilar to a4xx, and sampler state "array-pitch" needs to be aligned to page size. Signed-off-by: Rob Clark <[email protected]>
* swr: fix icc compile errorBruce Cherniak2016-12-231-1/+1
| | | | | | | | ICC doesn't like the use of nullptr (std::nullptr_t) argument in p_atomic_set. GCC and clang don't complain. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99119 Reviewed-by: Tim Rowley <[email protected]>
* radeonsi: Bugfix needed for hashcatChristian Inci2016-12-221-5/+7
| | | | | | | | | Hashcat needs MAX_GLOBAL_BUFFERS to be 21 or even 22 for some modes. It'll crash otherwise. I'm adding an assert to see if programs need it to be even higher. Signed-off-by: Christian Inci <[email protected]> [Handle first properly; should be NFC, since clover always uses first == 0.] Signed-off-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix gl_ClipDistance and gl_ClipVertex for pointsNicolai Hähnle2016-12-221-2/+10
| | | | | | | | | | The clipper hardware doesn't consider points as primitives that can be clipped. Simply setting the corresponding cull bits works, and should not have an adverse effect on other primitive types according to the hardware team. Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: only set VS_OUT_MISC_SIDE_BUS_ENA when the misc vector is usedNicolai Hähnle2016-12-221-5/+6
| | | | | | | | Should have no effect (other than perhaps on power consumption), but Vulkan does this. Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* llvmpipe: Link tests with CLOCK_LIB.Vinson Lee2016-12-211-1/+2
| | | | | | | | | | Fix linking error with 'make check'. CXXLD lp_test_format ../../../../src/gallium/auxiliary/.libs/libgallium.a(os_time.o): In function `os_time_get_nano': src/gallium/auxiliary/os/os_time.c:59: undefined reference to `clock_gettime' Signed-off-by: Vinson Lee <[email protected]>
* radeonsi: add Polaris12 support (v3)Junwei Zhang2016-12-214-1/+7
| | | | | | | | | | | v2: use gfxip names for llvm 4.0+ v3: use tonga for llvm <= 3.8, drop gfxip name, we can just change that we change the other asics. Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Junwei Zhang <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Acked-by: Christian König <[email protected]>
* svga: Fix a strict-aliasing violation in shader dumperEdward O'Callaghan2016-12-211-1/+9
| | | | | | | | | | | | | | | As per the C spec, it is illegal to alias pointers to different types. This results in undefined behaviour after optimization passes, resulting in very subtle bugs that happen only on a full moon.. Use a memcpy() as a well defined coercion between the isomorphic bit-field interpretations of memory. V.2: Use C99 compat STATIC_ASSERT() over C11 static_assert(). Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* freedreno/a5xx: border color supportRob Clark2016-12-181-3/+160
| | | | | | | Not 100% sure it works if you have border color in VS.. but it might be right. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: use MRT0 to import linear zsRob Clark2016-12-181-5/+20
| | | | | | | | | | | A bit of a hack, but we need to do this until we can do tiled zs in sysmem (and associated tile/until blits for transfer_map). Fixes xonotic and glmark2 "refract", when reorder wasn't enabled. (reorder would paper over the issue by avoiding the extra round- trip to system memory and back to gmem. Signed-off-by: Rob Clark <[email protected]>
* freedreno: fdN_gmem_restore_format() is not gen specificRob Clark2016-12-188-50/+25
| | | | | | | | | | Refactor out into a common helper, since this is the same across generations when we need equiv z/s gmem restore format. Next patch needs this in a5xx, rather than creating yet another helper push this into core. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: cargo-cult end-batch sequence more faithfullyRob Clark2016-12-184-4/+39
| | | | | | | Fixes some issues at least with GMEM bypass mode, where we'd sometimes end up with some FS quads not hitting memory. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: misc fixRob Clark2016-12-181-1/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: fix (at least some) vtx formatsRob Clark2016-12-181-1/+1
| | | | | | | | | Swap/component-order doesn't seem to be quite what that is. At least blob was always setting it to XYZW ('11') but we weren't. Causing problems w/ formats like sint16.. Hard-coding this instead at least seems to get glamor working. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: more formatsRob Clark2016-12-181-25/+25
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: fixup capsRob Clark2016-12-182-6/+11
| | | | | | | | | | Might not be 100% accurate, mostly just copy from a4xx to get started. We are defn lying about occlusion query at this point (not implemented yet) but need it to expose anything higher than gl1.4 (glamor needs gl2.1) Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: fix random faults on first sysmem drawRob Clark2016-12-181-0/+3
| | | | | | | | Not sure what this event is, but blob writes it.. and it seems to solve random write faults at mystery address that would sometimes happen on first BYPASS draw. Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2016-12-186-17/+80
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: fix stride/size for mem->gmem blitsRob Clark2016-12-181-5/+7
| | | | | | <brownpaperbag>these should be the in-GMEM dimensions</brownpaperbag> Signed-off-by: Rob Clark <[email protected]>
* swr: Implement fence attached work queues for deferred deletion.Bruce Cherniak2016-12-169-54/+255
| | | | | | | Work can now be added to fences and triggered by fence completion. This allows for deferred resource deletion, and other asynchronous tasks. Reviewed-by: George Kyriazis <[email protected]>
* treewide: s/comparitor/comparator/Ilia Mirkin2016-12-122-2/+2
| | | | | | | | | | git grep -l comparitor | xargs sed -i 's/comparitor/comparator/g' Just happened to notice this in a patch that was sent and included one of the tokens in question. Signed-off-by: Ilia Mirkin <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* swr: [rasterizer core/memory] StoreTile: AVX512 progressTim Rowley2016-12-122-222/+138
| | | | | | Fixes to 128-bit formats. Reviwed-by: Bruce Cherniak <[email protected]>
* radeonsi: shrink the GSVS ring to account for the reduced item sizesNicolai Hähnle2016-12-121-1/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: shrink each vertex stream to the actually required sizeNicolai Hähnle2016-12-122-25/+40
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: use a single descriptor for the GSVS ringNicolai Hähnle2016-12-124-50/+67
| | | | | | | | | We can hardcode all of the fields for swizzling in the geometry shader. The advantage is that we use fewer descriptor slots and we no longer have to update any of the (ring) descriptors when the geometry shader changes. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: pack GS output components for each vertex stream contiguouslyNicolai Hähnle2016-12-121-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note that the memory layout of one vertex stream inside one "item" (= memory written by one GS wave) on the GSVS ring is: t0v0c0 ... t15v0c0 t0v1c0 ... t15v1c0 ... t0vLc0 ... t15vLc0 t0v0c1 ... t15v0c1 t0v1c1 ... t15v1c1 ... t0vLc1 ... t15vLc1 ... t0v0cL ... t15v0cL t0v1cL ... t15v1cL ... t0vLcL ... t15vLcL t16v0c0 ... t31v0c0 t16v1c0 ... t31v1c0 ... t16vLc0 ... t31vLc0 t16v0c1 ... t31v0c1 t16v1c1 ... t31v1c1 ... t16vLc1 ... t31vLc1 ... t16v0cL ... t31v0cL t16v1cL ... t31v1cL ... t16vLcL ... t31vLcL ... t48v0c0 ... t63v0c0 t48v1c0 ... t63v1c0 ... t48vLc0 ... t63vLc0 t48v0c1 ... t63v0c1 t48v1c1 ... t63v1c1 ... t48vLc1 ... t63vLc1 ... t48v0cL ... t63v0cL t48v1cL ... t63v1cL ... t48vLcL ... t63vLcL where tNN indicates the thread number, vNN the vertex number (in the order of EMIT_VERTEX), and cNN the output component (vL and cL are the last vertex and component, respectively). The vertex streams are laid out sequentially. The swizzling by 16 threads is hard-coded in the way the VGT generates the offset passed into the GS copy shader, and the jump every 16 threads is calculated from VGT_GSVS_RING_OFFSET_n and VGT_GSVS_RING_ITEMSIZE in a way that makes it difficult to deviate from this layout (at least that's what I've experimentally confirmed on VI after first trying to go the simpler route of just interleaving the vertex streams). Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: do not write non-existent components through the GSVS ringNicolai Hähnle2016-12-121-2/+4
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: only write values belonging to the stream when emitting GS vertexNicolai Hähnle2016-12-121-0/+3
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: generate an explicit switch instruction over vertex streamsNicolai Hähnle2016-12-121-8/+13
| | | | | | | | | | | | SimplifyCFG generates a switch instruction anyway when all four streams are present, but is simultaneously not smart enough to eliminate some redundant jumps that it generates. The generated assembly is still a bit silly, probably because the control flow annotation doesn't know how to handle a switch with uniform condition. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: fetch only outputs of current vertex stream from the GSVS ringNicolai Hähnle2016-12-121-16/+25
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: only export from GS copy shader for vertex stream 0Nicolai Hähnle2016-12-121-12/+19
| | | | | | | | When running the copy shader for vertex streams != 0, the SX does not need any data from us (there is no rasterization for the higher vertex streams, only streamout). Reviewed-by: Marek Olšák <[email protected]>