aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* pan/midgard: Ignore inline_constant in livenessAlyssa Rosenzweig2019-07-221-0/+3
| | | | | | It doesn't make any sense to look at it. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Implement load/store scratch opcodesAlyssa Rosenzweig2019-07-224-2/+52
| | | | | | | | These are used to load/store from Thread Local Storage, which is memory allocated per-thread (corresponding to ctx->scratchpad in the command stream) and used for register spilling. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midg/disasm: Check for int varying opsAlyssa Rosenzweig2019-07-221-0/+4
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Remove "aliasing"Alyssa Rosenzweig2019-07-222-96/+0
| | | | | | | It was a crazy idea that didn't pan out. We're better served by a good copyprop pass. It's also unused now. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Promote uniform registers lateAlyssa Rosenzweig2019-07-226-82/+174
| | | | | | | | | Rather than creating either a load or a uniform register read with a fixed beginning offset, we always create a load and then promote to a uniform register later. This will allow us to promote in a register pressure aware manner. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Call scheduler/RA in a loopAlyssa Rosenzweig2019-07-223-13/+27
| | | | | | | | | | | | This will allow us to insert instructions as a result of register allocation, permitting spilling to be implemented. As a side effect, with the assert commented out this would fix a bunch of glamor crashes (due to RA failures) so MATE becomes useable. Ideally we'll have scheduling or RA actually sorted out before the branch point but if not this gives us a one-line out to get X working... Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Remove custom register selection callbackAlyssa Rosenzweig2019-07-221-19/+0
| | | | | | What we have is equivalent to the default callback; let's use that. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* radv: fix crash in vkCmdClearAttachments with unused attachmentSamuel Pitoiset2019-07-221-1/+1
| | | | | | | | | | | depth_stencil_attachment and/or ds_resolve attachment can be NULL. This fixes crashes with dEQP-VK.renderpass.suballocation.unused_clear_attachments.* Cc: 19.1 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* i965: free object labels when deletingSergii Romantsov2019-07-223-0/+3
| | | | | | | | Some leaks detected with GL_KHR_debug on i965. CC: Timothy Arceri <[email protected]> Signed-off-by: Sergii Romantsov <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* radv/gfx10: update descriptors for inline uniform blocksSamuel Pitoiset2019-07-221-3/+10
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx10: emit the GS NGG prologue before the nested barrierSamuel Pitoiset2019-07-221-6/+1
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx10: do not allocate space for the ZPASS_DONE bugSamuel Pitoiset2019-07-221-6/+8
| | | | | | | GFX10 isn't affected. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx10: do not set ELEMENT_SIZE for buffer descriptorsSamuel Pitoiset2019-07-221-4/+4
| | | | | | | This field doesn't exist. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: clean up fill_geom_tess_rings()Samuel Pitoiset2019-07-221-25/+9
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: change a bunch of >= GFX9 to == GFX9Samuel Pitoiset2019-07-224-15/+15
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: do not clamp shadow reference on GFX10Samuel Pitoiset2019-07-221-2/+6
| | | | | | | | RadeonSI only uses Z32_FLOAT_CLAMP for upgraded depth textures on GFX10 and RADV doesn't promotes Z16 or Z24. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: move nir_opt_conditional_discard out of optimization loopDaniel Schürmann2019-07-221-1/+1
| | | | | | | | This late optimization pass is only affected by nir_opt_if() and handles all cases in a single pass. It's enough to call it once after the optimization loop. No changes on vkpipeline-db. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* v3d: fill logicop_func in the fragment shader key when precompiling shadersIago Toral Quiroga2019-07-221-0/+2
| | | | | | | | | | | Since logicop_func 0 is PIPE_LOGIOP_CLEAR, we were trigger lowerinng of logic ops on precompiled shaders, which we don't want to do. Also, this had the side effect of making shader-db crash, as during this lowering we would try to read the color format swizzle information from the fragment shader key that we don't populate in precompiled shaders because right now we only need it when logic operations are enabled. Reviewed-by: Eric Anholt <[email protected]>
* v3d: Avoid scheduling an instruction that stalls waiting for SFU retvalJose Maria Casanova Crespo2019-07-221-4/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we detect that a scheduling candidate will stall because having a register source that is the written by the SFU unit in the previous instruction we reduce its priority so any non stalling operation would be chosen. The latency of SFU operations is defined as 2. So they would be scheduled earlier if other candidates have the same priority. Finally we won't merge instructions that stall to a previously chosen one. As the result of the previous one would be waiting for an extra cycle. Although shader-db result show that instruction are hurt with an increase of 0.35% the sum of instructions + stalls is reduced a 0.52%. And the total of sfu-stalls is reduced a 63.51%. It implies also a small increase in the max-temps metric because of scheduling earlier SFU operations. total instructions in shared programs: 9102719 -> 9117851 (0.17%) instructions in affected programs: 4324628 -> 4339760 (0.35%) helped: 4162 HURT: 12128 helped stats (abs) min: 1 max: 10 x̄: 1.28 x̃: 1 helped stats (rel) min: 0.09% max: 4.76% x̄: 0.66% x̃: 0.51% HURT stats (abs) min: 1 max: 27 x̄: 1.69 x̃: 1 HURT stats (rel) min: 0.05% max: 7.69% x̄: 0.87% x̃: 0.68% 95% mean confidence interval for instructions value: 0.90 0.96 95% mean confidence interval for instructions %-change: 0.47% 0.50% Instructions are HURT. total max-temps in shared programs: 1327728 -> 1327812 (<.01%) max-temps in affected programs: 4730 -> 4814 (1.78%) helped: 61 HURT: 134 helped stats (abs) min: 1 max: 2 x̄: 1.08 x̃: 1 helped stats (rel) min: 2.70% max: 13.33% x̄: 4.89% x̃: 4.17% HURT stats (abs) min: 1 max: 3 x̄: 1.12 x̃: 1 HURT stats (rel) min: 1.54% max: 20.00% x̄: 6.10% x̃: 5.26% 95% mean confidence interval for max-temps value: 0.28 0.58 95% mean confidence interval for max-temps %-change: 1.80% 3.52% Max-temps are HURT. total sfu-stalls in shared programs: 99551 -> 36324 (-63.51%) sfu-stalls in affected programs: 95029 -> 31802 (-66.53%) helped: 25882 HURT: 0 helped stats (abs) min: 1 max: 27 x̄: 2.44 x̃: 2 helped stats (rel) min: 5.26% max: 100.00% x̄: 79.86% x̃: 100.00% 95% mean confidence interval for sfu-stalls value: -2.47 -2.42 95% mean confidence interval for sfu-stalls %-change: -80.18% -79.54% Sfu-stalls are helped. total inst-and-stalls in shared programs: 9202270 -> 9154175 (-0.52%) inst-and-stalls in affected programs: 5618516 -> 5570421 (-0.86%) helped: 22728 HURT: 855 helped stats (abs) min: 1 max: 31 x̄: 2.16 x̃: 1 helped stats (rel) min: 0.07% max: 16.67% x̄: 1.14% x̃: 0.92% HURT stats (abs) min: 1 max: 5 x̄: 1.25 x̃: 1 HURT stats (rel) min: 0.12% max: 5.26% x̄: 1.24% x̃: 0.86% 95% mean confidence interval for inst-and-stalls value: -2.07 -2.01 95% mean confidence interval for inst-and-stalls %-change: -1.07% -1.05% Inst-and-stalls are helped. v2: Rename v3d_qpu_generates_sfu_stalls to v3d_qpu_instr_is_sfu (Eric) Reviewed-by: Eric Anholt <[email protected]>
* v3d: add shader-db stat to count SFU stallsJose Maria Casanova Crespo2019-07-225-14/+74
| | | | | | | | | | | | | | SFU operations have a latency of 2 cicles, so if their results are used in the following cycle to a SFU instruction, the GPU stalls for an extra cycle until the result is available. This adds the number of stalls to the shader-db debug mode and sum of instruction + stalls to evaluate optimizations to schedule instructions that avoid generating sfu-stalls. v2: Rename v3d_qpu_generates_sfu_stalls to v3d_qpu_instr_is_sfu (Eric) Reviewed-by: Eric Anholt <[email protected]>
* radv: replace memset()+strcpy() with snprintf()Eric Engestrom2019-07-211-3/+1
| | | | | | | Just like the next line :) Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: drop unnecessary memset() before snprintf()Eric Engestrom2019-07-211-1/+0
| | | | | | | snprintf() always terminates the string. Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Fix uninitialized warning.Bas Nieuwenhuizen2019-07-211-1/+2
| | | | | | | | For es_vgpr_comp_cnt. Fixes: 795adbbadd4 "radv/gfx10: Add pipeline state support for tess." Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* virgl: fix a sync issue in virgl_buffer_transfer_extendChia-I Wu2019-07-191-62/+15
| | | | | | | | | | | | | | | | | | | | | | | | | In virgl_buffer_transfer_extend, when no flush is needed, it tries to extend a previously queued transfer instead if it can find one. Comparing to virgl_resource_transfer_prepare, it fails to check if the resource is busy. The existence of a previously queued transfer normally implies that the resource is not busy, maybe except for when the transfer is PIPE_TRANSFER_UNSYNCHRONIZED. Rather than burdening us with a lengthy comment, and potential concerns over breaking it as the transfer code evolves, this commit makes the valid_buffer_range check the only condition to take the fast path. In real world, we hit the fast path almost only because of the valid_buffer_range check. In micro benchmarks, the condition should always be true, otherwise the benchmarks are not very representative of meaningful workloads. I think this fix is justified. The recent change to PIPE_TRANSFER_MAP_DIRECTLY usage disables the fast path. This commit re-enables it as well. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: rework virgl_transfer_queue_extendChia-I Wu2019-07-193-25/+24
| | | | | | | | Do not take a transfer and do the memcpy. Add a _buffer suffix to the function name to make it clear that it is only for buffers. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: fix virgl_buffer_transfer_extendChia-I Wu2019-07-191-0/+1
| | | | | | | | Without setting hw_res, virgl_transfer_queue_extend never finds a match and always returns NULL. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* radeonsi: initialize scissor registers etc. without clear stateMarek Olšák2019-07-191-1/+1
| | | | | Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]> Acked-by: Samuel Pitoiset <[email protected]>
* radeonsi: return success from vi_dcc_clear_level to simplify callersMarek Olšák2019-07-193-28/+26
| | | | | Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]> Acked-by: Samuel Pitoiset <[email protected]>
* radeonsi: fix compute-based culling regression in 1ce52c1e373Marek Olšák2019-07-191-1/+1
| | | | | Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]> Acked-by: Samuel Pitoiset <[email protected]>
* radeonsi/gfx10: fix VGT_PRIMITIVE_TYPE programmingMarek Olšák2019-07-191-1/+3
| | | | | Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radeonsi/gfx10: enable Wave32 for vertex, geometry, and tessellation shadersMarek Olšák2019-07-191-0/+5
| | | | | Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radeonsi/gfx10: add debug options to enable/disable Wave32Marek Olšák2019-07-192-1/+35
| | | | | Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radeonsi/gfx10: add as_ngg variant for TES as ES to select Wave32/64Marek Olšák2019-07-194-15/+32
| | | | | | | Legacy GS has to use Wave64, so TES before GS has to use Wave64 too. Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radeonsi/gfx10: implement Wave32Marek Olšák2019-07-1915-71/+144
| | | | | Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radeonsi/gfx10: use 32-bit wavemasks for Wave32Marek Olšák2019-07-194-16/+43
| | | | | Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: create the LLVM builder in ac_llvm_context_initMarek Olšák2019-07-194-17/+15
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: create the LLVM module for Wave32 or Wave64 in ac_llvm_context_initMarek Olšák2019-07-194-7/+10
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/rtld: add support for Wave32Marek Olšák2019-07-198-5/+21
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: add Wave32 LLVM target machineMarek Olšák2019-07-192-1/+20
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: initial Wave32 support in LLVM build helpersMarek Olšák2019-07-194-13/+21
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* radeonsi: assume that selector != NULL for compute shadersMarek Olšák2019-07-191-14/+6
| | | | | Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]> Acked-by: Samuel Pitoiset <[email protected]>
* radeonsi: remove what appears to be legacy compute codeMarek Olšák2019-07-191-35/+6
| | | | | Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]> Acked-by: Samuel Pitoiset <[email protected]>
* radeonsi: remove si_program::use_code_object_v2Marek Olšák2019-07-192-6/+3
| | | | | Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]> Acked-by: Samuel Pitoiset <[email protected]>
* radeonsi: add si_shader_selector into si_computeMarek Olšák2019-07-194-81/+57
| | | | | | | | Now we can assume that shader->selector is always set. This will simplify some code. Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]> Acked-by: Samuel Pitoiset <[email protected]>
* radeonsi: set threadgroup size to 0 for threadgroups with only 1 waveMarek Olšák2019-07-191-3/+3
| | | | | | | This has no effect on Wave64. Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]> Acked-by: Samuel Pitoiset <[email protected]>
* radeonsi/gfx10: set as_ngg for GS prologMarek Olšák2019-07-192-5/+9
| | | | | | | as_ngg is required by Wave32. Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radeonsi/gfx10: remove the disable_ngg optionMarek Olšák2019-07-193-6/+2
| | | | | | | because legacy VS hangs. Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radeonsi/gfx10: combine hw edgeflags with user edgeflags for correct behaviorMarek Olšák2019-07-194-19/+73
| | | | | Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radeonsi/gfx10: deduplicate code for esvert_lds_sizeMarek Olšák2019-07-191-6/+16
| | | | | Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radeonsi/gfx10: simplify a streamout loop in gfx10_emit_ngg_epilogueMarek Olšák2019-07-191-7/+6
| | | | | Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>