aboutsummaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* nir: Remove a bunch of large stack arraysJason Ekstrand2019-07-224-6/+15
| | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Stop stack allocating large arraysJason Ekstrand2019-07-221-6/+12
| | | | | | | | | | | | | | | | Normally, we haven't worried too much about stack sizes as Linux tends to be fairly friendly towards large stacks. However, when running DXVK apps under wine, we're suddenly subject to Windows' more stringent stack limitations and can run out of space more easily. In particular, some of the shaders in Elite Dangerous: Horizons have quite a few registers and the arrays in split_virtual_grfs are large enough to blow a 1 MiB stack leading to crashes during shader compilation. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108662 Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: [email protected]
* egl/android: Update color_buffers querying for buffer ageNataraj Deshpande2019-07-222-6/+31
| | | | | | | | | | | | | | | | | | | | | | color_buffers[] is currently hard coded to 3 for android which fails in droid_window_dequeue_buffer when ANativeWindow creates color_buffers >3 while querying buffer age during dEQP partial_update tests on chromeOS. The patch removes static color_buffers[], queries for MIN_UNDEQUEUED_BUFFERS, sets native window buffer count and allocates the correct number of color_buffers as per android. Fixes dEQP-EGL.functional.partial_update* tests on chromebooks with enabling EGL_KHR_partial_update. v2: update comment instead of removing (Eric Engestrom) v3: change static array to dynamic allocated color_buffers querying MIN_UNDEQUEUED_BUFFERS (Chia-I Wu [email protected]) Fixes: 2acc69da8ce "EGL/Android: Add EGL_EXT_buffer_age extension" Signed-off-by: Nataraj Deshpande <[email protected]> Acked-by: Eric Engestrom <[email protected]> Reviewed-by: Chia-I Wu <[email protected]>
* intel/compiler: Use nir_opt_conditional_discardCaio Marcelo de Oliveira Filho2019-07-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | anv vkpipeline-db results for SKL: total instructions in shared programs: 3622461 -> 3611281 (-0.31%) instructions in affected programs: 396452 -> 385272 (-2.82%) helped: 2062 HURT: 1 total cycles in shared programs: 1458144669 -> 1458105320 (<.01%) cycles in affected programs: 4171830 -> 4132481 (-0.94%) helped: 1874 HURT: 180 total loops in shared programs: 2437 -> 2437 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 8745 -> 8748 (0.03%) spills in affected programs: 8 -> 11 (37.50%) helped: 1 HURT: 1 total fills in shared programs: 23392 -> 23395 (0.01%) fills in affected programs: 8 -> 11 (37.50%) helped: 1 HURT: 1 LOST: 0 GAINED: 1 No changes to shader-db on i965 or iris. The glsl compiler already does a similar optimization. Improvement suggested by Daniel Schürmann. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* pan/decode: Disable magic divisor debuggingAlyssa Rosenzweig2019-07-221-0/+2
| | | | | | | Memory corruption (for both legitimate and illegitimate reasons) causes this to hang pantrace. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Report spills:fills to shader-dbAlyssa Rosenzweig2019-07-223-2/+12
| | | | | | | Route this info through so we can track how we're doing on register spilling. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Reenable pipeline register creationAlyssa Rosenzweig2019-07-221-10/+9
| | | | | | | This was disabled to permit regression-free RA work. Now that the spill code is in place, we can reenable, with some caveats about efficacy. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Report tls_sizeAlyssa Rosenzweig2019-07-224-0/+13
| | | | | | | | Pipe through the number of bytes of spilled memory used from the compiler into the main driver, where it will be used to allocate the Thread Local Storage buffer. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Set `initialized` in more casesAlyssa Rosenzweig2019-07-222-10/+9
| | | | | | | Indirect linear writes were not being marked as initialized, causing the back blit to be dropped, breaking the listed tests. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/ci: Update expectationsAlyssa Rosenzweig2019-07-221-4/+0
| | | | | | We've fixed some shader tests. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Promote to *move*, not rewrite for non-SSAAlyssa Rosenzweig2019-07-221-2/+9
| | | | | | Fixes promoted uniform loads to registers. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Dump MIR of RA failureAlyssa Rosenzweig2019-07-221-1/+3
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard; Dump successor graph when printing MIRAlyssa Rosenzweig2019-07-221-2/+12
| | | | | | | We just use the pointers of the midgard_block*, which is crude, but it gets the point across and will help debug successor related issues. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Remove debug statementAlyssa Rosenzweig2019-07-221-2/+0
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Implement register spillingAlyssa Rosenzweig2019-07-224-54/+158
| | | | | | | | | | | | Now that we run RA in a loop, before each iteration after a failed allocation we choose a spill node and spill it to Thread Local Storage using st_int4/ld_int4 instructions (for spills and fills respectively). This allows us to compile complex shaders that normally would not fit within the 16 work register limits, although it comes at a fairly steep performance penalty. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Add mir_has_arg helperAlyssa Rosenzweig2019-07-221-0/+12
| | | | | | Helps scan the MIR for uses of an index. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Check write-before-read in liveness analysisAlyssa Rosenzweig2019-07-221-0/+13
| | | | | | | | If we write to an index before reading it, the old copy we're checking liveness for isn't live in this block, even if it does get read later. Fixes abnormally high register pressure in shaders with loops. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard/disasm: Check for certain tag errorsAlyssa Rosenzweig2019-07-221-0/+18
| | | | | | | | Midgard bundles contain a tag, as well as a copy of the tag of the next bundle to facilitate prefetch. Do some simple static analysis to detect certain tag errors (particularly on shaders without branching). Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Add OP_IS_CSEL helperAlyssa Rosenzweig2019-07-221-0/+7
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Add mir_rewrite_index_src_single helperAlyssa Rosenzweig2019-07-222-6/+13
| | | | | | | Rather than rewriting an index away across the whole block, we expose finer (per-instruction) granularity for rewrites. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Ignore inline_constant in livenessAlyssa Rosenzweig2019-07-221-0/+3
| | | | | | It doesn't make any sense to look at it. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Implement load/store scratch opcodesAlyssa Rosenzweig2019-07-224-2/+52
| | | | | | | | These are used to load/store from Thread Local Storage, which is memory allocated per-thread (corresponding to ctx->scratchpad in the command stream) and used for register spilling. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midg/disasm: Check for int varying opsAlyssa Rosenzweig2019-07-221-0/+4
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Remove "aliasing"Alyssa Rosenzweig2019-07-222-96/+0
| | | | | | | It was a crazy idea that didn't pan out. We're better served by a good copyprop pass. It's also unused now. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Promote uniform registers lateAlyssa Rosenzweig2019-07-226-82/+174
| | | | | | | | | Rather than creating either a load or a uniform register read with a fixed beginning offset, we always create a load and then promote to a uniform register later. This will allow us to promote in a register pressure aware manner. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Call scheduler/RA in a loopAlyssa Rosenzweig2019-07-223-13/+27
| | | | | | | | | | | | This will allow us to insert instructions as a result of register allocation, permitting spilling to be implemented. As a side effect, with the assert commented out this would fix a bunch of glamor crashes (due to RA failures) so MATE becomes useable. Ideally we'll have scheduling or RA actually sorted out before the branch point but if not this gives us a one-line out to get X working... Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Remove custom register selection callbackAlyssa Rosenzweig2019-07-221-19/+0
| | | | | | What we have is equivalent to the default callback; let's use that. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* radv: fix crash in vkCmdClearAttachments with unused attachmentSamuel Pitoiset2019-07-221-1/+1
| | | | | | | | | | | depth_stencil_attachment and/or ds_resolve attachment can be NULL. This fixes crashes with dEQP-VK.renderpass.suballocation.unused_clear_attachments.* Cc: 19.1 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* i965: free object labels when deletingSergii Romantsov2019-07-223-0/+3
| | | | | | | | Some leaks detected with GL_KHR_debug on i965. CC: Timothy Arceri <[email protected]> Signed-off-by: Sergii Romantsov <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* radv/gfx10: update descriptors for inline uniform blocksSamuel Pitoiset2019-07-221-3/+10
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx10: emit the GS NGG prologue before the nested barrierSamuel Pitoiset2019-07-221-6/+1
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx10: do not allocate space for the ZPASS_DONE bugSamuel Pitoiset2019-07-221-6/+8
| | | | | | | GFX10 isn't affected. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx10: do not set ELEMENT_SIZE for buffer descriptorsSamuel Pitoiset2019-07-221-4/+4
| | | | | | | This field doesn't exist. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: clean up fill_geom_tess_rings()Samuel Pitoiset2019-07-221-25/+9
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: change a bunch of >= GFX9 to == GFX9Samuel Pitoiset2019-07-224-15/+15
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: do not clamp shadow reference on GFX10Samuel Pitoiset2019-07-221-2/+6
| | | | | | | | RadeonSI only uses Z32_FLOAT_CLAMP for upgraded depth textures on GFX10 and RADV doesn't promotes Z16 or Z24. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: move nir_opt_conditional_discard out of optimization loopDaniel Schürmann2019-07-221-1/+1
| | | | | | | | This late optimization pass is only affected by nir_opt_if() and handles all cases in a single pass. It's enough to call it once after the optimization loop. No changes on vkpipeline-db. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* v3d: fill logicop_func in the fragment shader key when precompiling shadersIago Toral Quiroga2019-07-221-0/+2
| | | | | | | | | | | Since logicop_func 0 is PIPE_LOGIOP_CLEAR, we were trigger lowerinng of logic ops on precompiled shaders, which we don't want to do. Also, this had the side effect of making shader-db crash, as during this lowering we would try to read the color format swizzle information from the fragment shader key that we don't populate in precompiled shaders because right now we only need it when logic operations are enabled. Reviewed-by: Eric Anholt <[email protected]>
* v3d: Avoid scheduling an instruction that stalls waiting for SFU retvalJose Maria Casanova Crespo2019-07-221-4/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we detect that a scheduling candidate will stall because having a register source that is the written by the SFU unit in the previous instruction we reduce its priority so any non stalling operation would be chosen. The latency of SFU operations is defined as 2. So they would be scheduled earlier if other candidates have the same priority. Finally we won't merge instructions that stall to a previously chosen one. As the result of the previous one would be waiting for an extra cycle. Although shader-db result show that instruction are hurt with an increase of 0.35% the sum of instructions + stalls is reduced a 0.52%. And the total of sfu-stalls is reduced a 63.51%. It implies also a small increase in the max-temps metric because of scheduling earlier SFU operations. total instructions in shared programs: 9102719 -> 9117851 (0.17%) instructions in affected programs: 4324628 -> 4339760 (0.35%) helped: 4162 HURT: 12128 helped stats (abs) min: 1 max: 10 x̄: 1.28 x̃: 1 helped stats (rel) min: 0.09% max: 4.76% x̄: 0.66% x̃: 0.51% HURT stats (abs) min: 1 max: 27 x̄: 1.69 x̃: 1 HURT stats (rel) min: 0.05% max: 7.69% x̄: 0.87% x̃: 0.68% 95% mean confidence interval for instructions value: 0.90 0.96 95% mean confidence interval for instructions %-change: 0.47% 0.50% Instructions are HURT. total max-temps in shared programs: 1327728 -> 1327812 (<.01%) max-temps in affected programs: 4730 -> 4814 (1.78%) helped: 61 HURT: 134 helped stats (abs) min: 1 max: 2 x̄: 1.08 x̃: 1 helped stats (rel) min: 2.70% max: 13.33% x̄: 4.89% x̃: 4.17% HURT stats (abs) min: 1 max: 3 x̄: 1.12 x̃: 1 HURT stats (rel) min: 1.54% max: 20.00% x̄: 6.10% x̃: 5.26% 95% mean confidence interval for max-temps value: 0.28 0.58 95% mean confidence interval for max-temps %-change: 1.80% 3.52% Max-temps are HURT. total sfu-stalls in shared programs: 99551 -> 36324 (-63.51%) sfu-stalls in affected programs: 95029 -> 31802 (-66.53%) helped: 25882 HURT: 0 helped stats (abs) min: 1 max: 27 x̄: 2.44 x̃: 2 helped stats (rel) min: 5.26% max: 100.00% x̄: 79.86% x̃: 100.00% 95% mean confidence interval for sfu-stalls value: -2.47 -2.42 95% mean confidence interval for sfu-stalls %-change: -80.18% -79.54% Sfu-stalls are helped. total inst-and-stalls in shared programs: 9202270 -> 9154175 (-0.52%) inst-and-stalls in affected programs: 5618516 -> 5570421 (-0.86%) helped: 22728 HURT: 855 helped stats (abs) min: 1 max: 31 x̄: 2.16 x̃: 1 helped stats (rel) min: 0.07% max: 16.67% x̄: 1.14% x̃: 0.92% HURT stats (abs) min: 1 max: 5 x̄: 1.25 x̃: 1 HURT stats (rel) min: 0.12% max: 5.26% x̄: 1.24% x̃: 0.86% 95% mean confidence interval for inst-and-stalls value: -2.07 -2.01 95% mean confidence interval for inst-and-stalls %-change: -1.07% -1.05% Inst-and-stalls are helped. v2: Rename v3d_qpu_generates_sfu_stalls to v3d_qpu_instr_is_sfu (Eric) Reviewed-by: Eric Anholt <[email protected]>
* v3d: add shader-db stat to count SFU stallsJose Maria Casanova Crespo2019-07-225-14/+74
| | | | | | | | | | | | | | SFU operations have a latency of 2 cicles, so if their results are used in the following cycle to a SFU instruction, the GPU stalls for an extra cycle until the result is available. This adds the number of stalls to the shader-db debug mode and sum of instruction + stalls to evaluate optimizations to schedule instructions that avoid generating sfu-stalls. v2: Rename v3d_qpu_generates_sfu_stalls to v3d_qpu_instr_is_sfu (Eric) Reviewed-by: Eric Anholt <[email protected]>
* radv: replace memset()+strcpy() with snprintf()Eric Engestrom2019-07-211-3/+1
| | | | | | | Just like the next line :) Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: drop unnecessary memset() before snprintf()Eric Engestrom2019-07-211-1/+0
| | | | | | | snprintf() always terminates the string. Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Fix uninitialized warning.Bas Nieuwenhuizen2019-07-211-1/+2
| | | | | | | | For es_vgpr_comp_cnt. Fixes: 795adbbadd4 "radv/gfx10: Add pipeline state support for tess." Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* virgl: fix a sync issue in virgl_buffer_transfer_extendChia-I Wu2019-07-191-62/+15
| | | | | | | | | | | | | | | | | | | | | | | | | In virgl_buffer_transfer_extend, when no flush is needed, it tries to extend a previously queued transfer instead if it can find one. Comparing to virgl_resource_transfer_prepare, it fails to check if the resource is busy. The existence of a previously queued transfer normally implies that the resource is not busy, maybe except for when the transfer is PIPE_TRANSFER_UNSYNCHRONIZED. Rather than burdening us with a lengthy comment, and potential concerns over breaking it as the transfer code evolves, this commit makes the valid_buffer_range check the only condition to take the fast path. In real world, we hit the fast path almost only because of the valid_buffer_range check. In micro benchmarks, the condition should always be true, otherwise the benchmarks are not very representative of meaningful workloads. I think this fix is justified. The recent change to PIPE_TRANSFER_MAP_DIRECTLY usage disables the fast path. This commit re-enables it as well. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: rework virgl_transfer_queue_extendChia-I Wu2019-07-193-25/+24
| | | | | | | | Do not take a transfer and do the memcpy. Add a _buffer suffix to the function name to make it clear that it is only for buffers. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: fix virgl_buffer_transfer_extendChia-I Wu2019-07-191-0/+1
| | | | | | | | Without setting hw_res, virgl_transfer_queue_extend never finds a match and always returns NULL. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* radeonsi: initialize scissor registers etc. without clear stateMarek Olšák2019-07-191-1/+1
| | | | | Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]> Acked-by: Samuel Pitoiset <[email protected]>
* radeonsi: return success from vi_dcc_clear_level to simplify callersMarek Olšák2019-07-193-28/+26
| | | | | Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]> Acked-by: Samuel Pitoiset <[email protected]>
* radeonsi: fix compute-based culling regression in 1ce52c1e373Marek Olšák2019-07-191-1/+1
| | | | | Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]> Acked-by: Samuel Pitoiset <[email protected]>
* radeonsi/gfx10: fix VGT_PRIMITIVE_TYPE programmingMarek Olšák2019-07-191-1/+3
| | | | | Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>