summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: also wait for SDMA in the clear_buffer CPU fallbackMarek Olšák2017-01-051-3/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: simplify r600_resource typecasts in si_clear_bufferMarek Olšák2017-01-051-5/+5
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: always use SDMA for big buffer clears and first buffer usesMarek Olšák2017-01-051-0/+20
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use SDMA in rvid_buffer_clear on CIK-VIMarek Olšák2017-01-051-2/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use SDMA for initial clearing of DCC/CMASK/HTILE on CIK-VIMarek Olšák2017-01-053-8/+6
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: implement SDMA-based buffer clearing for CIK-VIMarek Olšák2017-01-053-0/+54
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/hud: increase the vertex buffer size for textMarek Olšák2017-01-051-1/+1
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/hud: add an option to sort items below graphsMarek Olšák2017-01-052-5/+33
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/hud: add an option to reset the color counterMarek Olšák2017-01-052-3/+19
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/hud: allow more data sources per paneMarek Olšák2017-01-051-13/+15
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/hud: add an option to rename each data sourceMarek Olšák2017-01-051-2/+35
| | | | | | | | useful for radeonsi performance counters v2: allow specifying both : and = Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: remove TGSI_OPCODE_SUBMarek Olšák2017-01-0529-190/+64
| | | | | | It's redundant with the source modifier. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: remove TGSI_OPCODE_ABSMarek Olšák2017-01-0521-88/+8
| | | | | | It's redundant with the source modifier. Reviewed-by: Nicolai Hähnle <[email protected]>
* st/nine: Remove all usage of ureg_SUB in nine_shaderAxel Davy2017-01-051-8/+8
| | | | | | | This is required to drop gallium SUB. Signed-off-by: Axel Davy <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* st/nine: Remove all usage of ureg_SUB in nine_ffAxel Davy2017-01-051-20/+20
| | | | | | | This is required to remove gallium SUB. Signed-off-by: Axel Davy <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* st/nine: Do not map SUB and ABS to their gallium equivalent.Axel Davy2017-01-051-2/+23
| | | | | | | This is required for gallium SUB and ABS to be removed. Signed-off-by: Axel Davy <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* st/va: fix incorrect argument in vl_compositor_cleanupNayan Deshmukh2017-01-051-1/+1
| | | | | | | | This fixes the mistake introduced in commit b6737a8bcd03ea68952799144c0c6e6e6679bee9 Signed-off-by: Nayan Deshmukh <[email protected]> Reviewed-by: Christian König <[email protected]>
* swr: remove unneeded llvm version checkTim Rowley2017-01-051-4/+0
| | | | | | | Old test caused breakage with llvm-svn (4.0.0svn), and not needed as the minimum required llvm version for swr is 3.6. Reviewed-by: George Kyriazis <[email protected]>
* swr: fix windows build breakGeorge Kyriazis2017-01-052-4/+7
| | | | | | | | | | wrap lp_bld_type.h around extern "C". Windows decorates global variables, so when used from .cpp files, need to use an undecorated version. Also, removed related and unneeded code from swr_screen.cpp Reviewed-by: Ilia Mirkin <[email protected]>
* radeonsi: update clip_regs if clip_disable changes to fix a hangMarek Olšák2017-01-051-0/+5
| | | | | | | | | | | | | | This seems to fix the GPU hangs caused by: commit ed3190b3f3a776fc8c75b1e6130a88079166d115 Author: Marek Olšák <[email protected]> Date: Sun Nov 13 18:41:43 2016 +0100 radeonsi: don't export ClipVertex and ClipDistance[] if clipping is disabled Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99219 Tested-by: Samuel Pitoiset <[email protected]>
* gallium: add PIPE_CAP_GLSL_OPTIMIZE_CONSERVATIVELYMarek Olšák2017-01-0517-0/+19
| | | | | | Drivers with good compilers don't need aggressive optimizations before TGSI. Reviewed-by: Eric Anholt <[email protected]>
* va: call texture_get_handle while the mutex is being heldMarek Olšák2017-01-041-2/+5
| | | | | | | The context may be used by texture_get_handle. Reviewed-by: Christian König <[email protected]> Cc: 13.0 <[email protected]>
* vdpau: call texture_get_handle while the mutex is being heldMarek Olšák2017-01-042-6/+13
| | | | | | | | | The context may be used by texture_get_handle. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99158 Reviewed-by: Christian König <[email protected]> Cc: 13.0 <[email protected]>
* radeonsi: capitalize VM hex addr when dumping buffer listSamuel Pitoiset2017-01-041-1/+1
| | | | | | | | Useful when debugging with R600_DEBUG=vm,check_vm to match addr in both outputs. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/hud: add a path separator between dump directory and filenameEdmondo Tommasina2017-01-031-1/+2
| | | | | | | It's more user friendly and it avoids to write files in unexpected places. Signed-off-by: Marek Olšák <[email protected]>
* r600/sb: Fix loop optimization related hangs on egHeiko Przybyl2017-01-036-30/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure unused ops and their references are removed, prior to entering the GCM (global code motion) pass, to stop GCM from breaking the loop logic and thus hanging the GPU. Turns out, that sb has problems with loops and node optimizations regarding associative folding: - the global code motion (gcm) pass moves ops up a loop level/basic block until they've fulfilled their total usage count - if there are ops folded into others, the usage count won't be fulfilled and thus the op moved way up to the top - within GCM the op would be visited and their deps would be moved alongside it, to fulfill the src constaints - in a loop, an unused op is moved out of the loop and GCM would move the src value ops up as well - now here arises the problem: if the loop counter is one of the src values it would get moved up as well, the loop break condition would never get hit and the shader turn into an endless loop, resulting in the GPU hanging and being reset A reduced (albeit nonsense) piglit example would be: [require] GLSL >= 1.20 [fragment shader] uniform int SIZE; uniform vec4 lights[512]; void main() { float x = 0; for(int i = 0; i < SIZE; i++) x += lights[2*i+1].x; } [test] uniform int SIZE 1 draw rect -1 -1 2 2 Which gets optimized to: ===== SHADER #12 OPT ================================== PS/BARTS/EVERGREEN ===== ===== 42 dw ===== 1 gprs ===== 2 stack ========================================= ALU 3 @24 1 y: MOV R0.y, 0 t: MULLO_UINT R0.w, [0x00000002 2.8026e-45].x, R0.z LOOP_START_DX10 @22 PUSH @6 ALU 1 @30 KC0[CB0:0-15] 2 M x: PRED_SETGE_INT __.x, R0.z, KC0[0].x JUMP @14 POP:1 LOOP_BREAK @20 POP @14 POP:1 ALU 2 @32 3 x: ADD_INT R0.x, R0.w, [0x00000002 2.8026e-45].x TEX 1 @36 VFETCH R0.x___, R0.x, RID:0 MFC:16 UCF:0 FMT[..] ALU 1 @40 4 y: ADD R0.y, R0.y, R0.x LOOP_END @4 EXPORT_DONE PIXEL 0 R0.____ EOP ===== SHADER_END =============================================================== Notice R0.z being the loop counter/break condition relevant register and being never incremented at all. Also some of the loop content has been moved out of it, to fulfill the requirements for the one unused op. With a debug build of mesa this would produce an error like error at : PRED_SETGE_INT __, __, EM.2, R1.x.2||[email protected], C0.x : operand value R1.x.2||[email protected] was not previously written to its gpr and the compilation would fail due to this. On a release build it gets passed to the GPU. When using this patch, the loop remains intact: ===== SHADER #12 OPT ================================== PS/BARTS/EVERGREEN ===== ===== 48 dw ===== 1 gprs ===== 2 stack ========================================= ALU 2 @24 1 y: MOV R0.y, 0 z: MOV R0.z, 0 LOOP_START_DX10 @22 PUSH @6 ALU 1 @28 KC0[CB0:0-15] 2 M x: PRED_SETGE_INT __.x, R0.z, KC0[0].x JUMP @14 POP:1 LOOP_BREAK @20 POP @14 POP:1 ALU 4 @30 3 t: MULLO_UINT T0.x, [0x00000002 2.8026e-45].x, R0.z 4 x: ADD_INT R0.x, T0.x, [0x00000002 2.8026e-45].x TEX 1 @40 VFETCH R0.x___, R0.x, RID:0 MFC:16 UCF:0 FMT[..] ALU 2 @44 5 y: ADD R0.y, R0.y, R0.x z: ADD_INT R0.z, R0.z, 1 LOOP_END @4 EXPORT_DONE PIXEL 0 R0.____ EOP ===== SHADER_END =============================================================== Piglit: ./piglit summary console -d results/*_gpu_noglx name: unpatched_gpu_noglx patched_gpu_noglx ---- ------------------- ----------------- pass: 18016 18021 fail: 748 743 crash: 7 7 skip: 1124 1124 timeout: 0 0 warn: 13 13 incomplete: 0 0 dmesg-warn: 0 0 dmesg-fail: 0 0 changes: 0 5 fixes: 0 5 regressions: 0 0 total: 19908 19908 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94900 Tested-by: Heiko Przybyl <[email protected]> Tested-on: Barts PRO HD6850 Signed-off-by: Heiko Przybyl <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* vl/zscan: fix "Fix trivial sign compare warnings"Christian König2017-01-031-1/+1
| | | | | | | | | | | The variable actually needs to be signed, otherwise converting it to a float doesn't work as expected. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=98914 Signed-off-by: Christian König <[email protected]> Reviewed-by: Nayan Deshmukh <[email protected]> Cc: "13.0" <[email protected]> Fixes: 1fb4179f927 ("vl: Fix trivial sign compare warnings")
* st/va: error handlingNayan Deshmukh2017-01-031-3/+15
| | | | | | | | handle the cases when vl_compositor_set_csc_matrix(), vl_compositor_init_state() and vl_compositor_init() fail Signed-off-by: Nayan Deshmukh <[email protected]> Reviewed-by: Christian König <[email protected]>
* st/vdpau: error handlingNayan Deshmukh2017-01-033-15/+50
| | | | | | | | handle the cases when vl_compositor_set_csc_matrix(), vl_compositor_init_state() and vl_compositor_init() fail Signed-off-by: Nayan Deshmukh <[email protected]> Reviewed-by: Christian König <[email protected]>
* vl/compositor: implement error handlingNayan Deshmukh2017-01-032-3/+12
| | | | | | | pipe_buffer_map and pipe_buffer_create may return NULL Signed-off-by: Nayan Deshmukh <[email protected]> Reviewed-by: Christian König <[email protected]>
* gallium/hud: fix the windows build by disabling file dumpingMarek Olšák2017-01-021-0/+2
|
* gallium/hud: set filedescriptor for fps graphEdmondo Tommasina2017-01-011-0/+2
| | | | Signed-off-by: Marek Olšák <[email protected]>
* gallium/hud: set filedescriptor for cpu graphEdmondo Tommasina2017-01-011-0/+2
| | | | Signed-off-by: Marek Olšák <[email protected]>
* gallium/hud: move file initialization to a functionEdmondo Tommasina2017-01-013-11/+20
| | | | | | | The function will be used later to create the filedescriptor for other metrics. Signed-off-by: Marek Olšák <[email protected]>
* gallium/hud: dump hud_driver_query values to filesEdmondo Tommasina2017-01-013-0/+19
| | | | | | | | | | | | | | Dump values for every selected data source in GALLIUM_HUD. Every data source has its own file and the filename is equal to the data source identifier. Set GALLIUM_HUD_DUMP_DIR to dump values to files in this directory. No values are dumped if the environment variable is not set, the directory doesn't exist or the user doesn't have write access. Signed-off-by: Marek Olšák <[email protected]>
* freedreno/ir3: rework varying slots (maybe??)Rob Clark2016-12-301-4/+9
| | | | | | | | | | See: dEQP-GLES2.functional.shaders.swizzles.vector_swizzles.mediump_vec2_yyyy_fragment if we only access (in FS) varying.y then it ends up in slot zero.. I'm not sure the hw likes that.. Signed-off-by: Rob Clark <[email protected]>
* nir: Rename convert_to_ssa lower_regs_to_ssaJason Ekstrand2016-12-292-2/+2
| | | | This matches the naming of nir_lower_vars_to_ssa, the other to-SSA pass.
* vc4: Rework scheduling of thread switch to cut one more NOP.Eric Anholt2016-12-291-46/+75
| | | | | | | | | | | | | | Jonas's patch got us most of the benefit of scheduling instructions into the delay slots of thread switch, but if there had been nothing to pair the thrsw with, it would move the thrsw up and leave a NOP where the thrsw was. Instead, don't pair anything with thrsw through the normal scheduling path, and have a separate helper function that inserts the thrsw earlier if possible and inserts any necessary NOPs. total instructions in shared programs: 93027 -> 92643 (-0.41%) instructions in affected programs: 14952 -> 14568 (-2.57%)
* vc4: Fill thread switching delay slotsJonas Pfeil2016-12-291-7/+38
| | | | | | | | | | | | | | | Scan for instructions without a signal set in front of the switching instruction and move the signal up there. shader-db results: total instructions in shared programs: 94494 -> 93027 (-1.55%) instructions in affected programs: 23545 -> 22078 (-6.23%) v2: Fix re-emitting of the instruction in the loop trying to emit NOPs, drop a scheduling change from branch delay slots. (by anholt) Signed-off-by: Jonas Pfeil <[email protected]>
* vc4: Enable NIR-based loop unrolling.Eric Anholt2016-12-291-0/+5
| | | | | This successfully unrolls a new shader in GLB2.7, which also gets that shader to successfully compile in multithreaded mode.
* freedreno/ir3: fix linkage::var sizeRob Clark2016-12-271-1/+1
| | | | | | | It should actually be 32 for a4xx/a5xx.. we still only advertise 16 but for a5xx the linkage map includes position/psize. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: treat clipvertex like a normal varyingRob Clark2016-12-271-3/+1
| | | | | | | | We need this in case it is streamed out. Not sure why we were treating it specially before. Having it as a VS out is harmless if FS doesn't have a matching input. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: transform-feedback supportRob Clark2016-12-277-38/+209
| | | | | | | | | | | We'll need to revisit when adding hw binning pass support, whether we can still do this in main draw step, as we do w/ a3xx/a4xx, or if we needed to move it to the binning stage. Still some failing piglits but most tests pass and the common cases seem to work. Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2016-12-277-43/+81
| | | | | | | Pull in a5xx streamout related regs. Also fixes a couple incorrect register definitions. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: UBO support for 64b GPUs (a5xx)Rob Clark2016-12-271-3/+24
| | | | | | Update address calculation to support 64b addresses. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: rework location of driver constantsRob Clark2016-12-276-53/+75
| | | | | | | | | | | Rework how we lay out driver constants (driver-params, UBO/TFBO buffer addresses, immediates) for more flexibility. For a5xx+ we need to deal with the fact that gpu ptrs are 64b instead of 32b, which makes the fixed offset scheme not work so well. While we are dealing with that we might also make the layout more dynamic to account for varying # of UBOs, etc. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: fix emit for bo addressesRob Clark2016-12-271-3/+9
| | | | | | Reloc for the buffer address is two dwords on 64b devices (a5xx+) Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: texture layoutRob Clark2016-12-272-2/+2
| | | | | | | Seems to be imilar to a4xx, and sampler state "array-pitch" needs to be aligned to page size. Signed-off-by: Rob Clark <[email protected]>
* ttn: set ->info->num_ubosRob Clark2016-12-271-1/+4
| | | | | | | | | For dealing w/ 32b vs 64b gpu addresses, I need to rework how we pass UBO buffer addresses to shader, and knowing up front the # of UBOs is useful. But I noticed ttn wasn't setting this. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* clover: Use Clang's diagnosticsVedran Miletić2016-12-241-1/+6
| | | | | | | | | | | | | | | | Presently errors from frontend are handled only if they occur in clang::CompilerInvocation::CreateFromArgs(). This patch uses clang::DiagnosticsEngine to detect errors such as invalid values for Clang frontend arguments. Fixes Piglit's cl/program/build/fail/invalid-version-declaration.cl test. v2: fix inconsistent code formatting Signed-off-by: Vedran Miletić <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Tested-by: Aaron Watry <[email protected]>