aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/radeonsi
Commit message (Collapse)AuthorAgeFilesLines
* nir: Initialize lower_flrp_progress everywhereIan Romanick2019-05-091-1/+1
| | | | | | | | | | | | | | | | I don't know why I thought NIR_PASS always set the progress variable. Derp. Fixes: d41cdef2a59 ("nir: Use the flrp lowering pass instead of nir_opt_algebraic") Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Coverity CID: 1444996 Coverity CID: 1444995 Coverity CID: 1444994 Coverity CID: 1444993 Coverity CID: 1444991 Coverity CID: 1444989
* radeonsi: add an AMD_TEX_ANISO environment variableTimothy Arceri2019-05-081-0/+4
| | | | | | | This brings it inline with the recently added AMD_DEBUG. Reviewed-by: Marek Olšák <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109619
* intel/compiler: Use the flrp lowering pass for all stages on Gen4 and Gen5Ian Romanick2019-05-061-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously lower_flrp32 was only set for vertex shaders. Fragment shaders performed a(1-c)+bc lowering during code generation. The shaders with loops hurt are SIMD8 and SIMD16 shaders for a text-identical fragment shader. v2: Rebase on 26391cceaa1 ("intel/compiler: Lower ffma on Gen4 and Gen5"). v3: Rebase on a004e95dd73 ("radeonsi/nir: create si_nir_opts() helper") Iron Lake total instructions in shared programs: 8211385 -> 8185974 (-0.31%) instructions in affected programs: 2503898 -> 2478487 (-1.01%) helped: 9936 HURT: 921 helped stats (abs) min: 1 max: 155 x̄: 2.86 x̃: 2 helped stats (rel) min: 0.10% max: 35.48% x̄: 1.67% x̃: 1.11% HURT stats (abs) min: 1 max: 12 x̄: 3.24 x̃: 2 HURT stats (rel) min: 0.21% max: 13.64% x̄: 1.86% x̃: 0.89% 95% mean confidence interval for instructions value: -2.43 -2.25 95% mean confidence interval for instructions %-change: -1.41% -1.33% Instructions are helped. total cycles in shared programs: 188523186 -> 188401198 (-0.06%) cycles in affected programs: 71541604 -> 71419616 (-0.17%) helped: 11649 HURT: 1871 helped stats (abs) min: 2 max: 930 x̄: 12.62 x̃: 6 helped stats (rel) min: <.01% max: 44.61% x̄: 0.68% x̃: 0.25% HURT stats (abs) min: 2 max: 138 x̄: 13.38 x̃: 8 HURT stats (rel) min: <.01% max: 10.99% x̄: 0.49% x̃: 0.17% 95% mean confidence interval for cycles value: -9.42 -8.63 95% mean confidence interval for cycles %-change: -0.54% -0.50% Cycles are helped. total loops in shared programs: 852 -> 856 (0.47%) loops in affected programs: 0 -> 4 helped: 0 HURT: 4 HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.00% max: 0.00% x̄: 0.00% x̃: 0.00% 95% mean confidence interval for loops value: 1.00 1.00 95% mean confidence interval for loops %-change: 0.00% 0.00% Loops are HURT. LOST: 3 GAINED: 12 GM45 total instructions in shared programs: 5046407 -> 5033694 (-0.25%) instructions in affected programs: 1303584 -> 1290871 (-0.98%) helped: 5010 HURT: 464 helped stats (abs) min: 1 max: 155 x̄: 2.85 x̃: 2 helped stats (rel) min: 0.10% max: 34.38% x̄: 1.63% x̃: 1.08% HURT stats (abs) min: 1 max: 75 x̄: 3.39 x̃: 2 HURT stats (rel) min: 0.20% max: 13.04% x̄: 1.84% x̃: 0.87% 95% mean confidence interval for instructions value: -2.45 -2.20 95% mean confidence interval for instructions %-change: -1.40% -1.28% Instructions are helped. total cycles in shared programs: 128889476 -> 128812366 (-0.06%) cycles in affected programs: 44845402 -> 44768292 (-0.17%) helped: 6079 HURT: 940 helped stats (abs) min: 2 max: 930 x̄: 15.16 x̃: 8 helped stats (rel) min: <.01% max: 41.03% x̄: 0.71% x̃: 0.25% HURT stats (abs) min: 2 max: 138 x̄: 16.01 x̃: 8 HURT stats (rel) min: <.01% max: 10.99% x̄: 0.50% x̃: 0.17% 95% mean confidence interval for cycles value: -11.63 -10.34 95% mean confidence interval for cycles %-change: -0.58% -0.52% Cycles are helped. total loops in shared programs: 633 -> 635 (0.32%) loops in affected programs: 0 -> 2 helped: 0 HURT: 2 total spills in shared programs: 60 -> 69 (15.00%) spills in affected programs: 54 -> 63 (16.67%) helped: 0 HURT: 1 total fills in shared programs: 92 -> 105 (14.13%) fills in affected programs: 80 -> 93 (16.25%) helped: 0 HURT: 1 LOST: 15 GAINED: 15 Reviewed-by: Jason Ekstrand <[email protected]> [v2] Reviewed-by: Matt Turner <[email protected]> [v2]
* nir: Use the flrp lowering pass instead of nir_opt_algebraicIan Romanick2019-05-061-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | I tried to be very careful while updating all the various drivers, but I don't have any of that hardware for testing. :( i965 is the only platform that sets always_precise = true, and it is only set true for fragment shaders. Gen4 and Gen5 both set lower_flrp32 only for vertex shaders. For fragment shaders, nir_op_flrp is lowered during code generation as a(1-c)+bc. On all other platforms 64-bit nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old nir_opt_algebraic method. No changes on any other Intel platforms. v2: Add panfrost changes. Iron Lake and GM45 had similar results. (Iron Lake shown) total cycles in shared programs: 188647754 -> 188647748 (<.01%) cycles in affected programs: 5096 -> 5090 (-0.12%) helped: 3 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% Reviewed-by: Matt Turner <[email protected]>
* nir: nir_shader_compiler_options: drop native_integersChristian Gmeiner2019-05-071-1/+0
| | | | | | | | Driver which do not support native integers should use a lowering pass to go from integers to floats. Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* radeonsi: set sampler state and view functions for compute-only contextsMarek Olšák2019-05-013-9/+12
|
* radeonsi: use new atomic LLVM helpersMarek Olšák2019-05-011-8/+4
| | | | This depends on "ac,ac/nir: use a better sync scope for shared atomics"
* radeonsi/nir: call radeonsi nir opts before the scan passTimothy Arceri2019-05-012-0/+2
| | | | | | | | | | | | | | Some of the opts are not called in the general optimastion loop in the state trackers glsl -> nir conversion. We need to call the radeonsi specific optimisation once before scanning over the nir otherwise we can end up gathering info on code that is later removed. Fixes an assert in the piglit test: ./bin/varying-struct-centroid_gles3 Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/nir: create si_nir_opts() helperTimothy Arceri2019-05-012-36/+43
| | | | | | We will make use of this in the following commit. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: implement resource_get_infoJulien Isorce2019-04-301-8/+35
| | | | | | | | Re-use existing si_texture_get_offset. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110443 Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* delete autotools .gitignore filesEric Engestrom2019-04-292-4/+0
| | | | | | | | One special case, `src/util/xmlpool/.gitignore` is not entirely deleted, as `xmlpool.pot` still gets generated (eg. by `ninja xmlpool-pot`). Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* radeonsi: don't ignore PIPE_FLUSH_ASYNCMarek Olšák2019-04-261-1/+1
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* ac/nir: Add support for planes.Bas Nieuwenhuizen2019-04-251-0/+7
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* radeonsi: remove dirty slot masks from scissor and viewport statesMarek Olšák2019-04-256-93/+40
| | | | | | All registers in the array need to be updated if any of them is changed. Only apps writing gl_ViewportIndex were affected by this bug.
* radeonsi/gfx9: rework the gfx9 scissor bug workaround (v2)Marek Olšák2019-04-258-48/+68
| | | | | | | | | | | Needed to track context rolls caused by streamout and ACQUIRE_MEM. ACQUIRE_MEM can occur outside of draw calls. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110355 v2: squashed patches and done more rework Cc: 19.0 <[email protected]>
* radeonsi/gfx9: set that window_rectangles always roll the contextMarek Olšák2019-04-251-1/+2
| | | | Cc: 19.0 <[email protected]>
* radeonsi: add radeonsi_sync_compile optionNicolai Hähnle2019-04-252-3/+11
| | | | | | | | | Force the driver thread to sync immediately with a compiler thread (but compilation still happens in a separate thread). This can be useful to simplify debugging compiler issues. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: add radeonsi_aux_debug option for aux context debug dumpsNicolai Hähnle2019-04-253-1/+33
| | | | | | | | Enabling this option will create ddebug-style dumps for the aux context, except that instead of intercepting the pipe_context layer we just dump the IB contents on flush. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: add si_debug_options for convenient adding/removing of optionsNicolai Hähnle2019-04-256-16/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the definition of radeonsi_clear_db_cache_before_clear there, as well as radeonsi_enable_nir. This removes the AMD_DEBUG=nir option. We currently still have two places for options: the driconf machinery and AMD_DEBUG/R600_DEBUG. If we are to have a single place for options, then the driconf machinery should be preferred since it's more flexible. The only downside of the driconf machinery was that adding new options was quite inconvenient. With this change, a simple boolean option can be added with a single line of code, same as for AMD_DEBUG. One technical limitation of this particular implementation is that while almost all driconf features are available, the translation machinery doesn't pick up the description strings for options added in si_debvug_options. In practice, translations haven't been provided anyway, and this is intended for developer options, so I'm not too worried. It could always be added later if anybody really cares. v2: - use bool instead of uint8_t for options - si_debug_options.inc -> si_debug_options.h Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: add BOs after need_cs_spaceMarek Olšák2019-04-242-6/+6
| | | | | | | | need_cs_space may clear the buffer list. Fixes: 951d60f8cdc88 "radeonsi: delay adding BOs at the beginning of IBs until the first draw" Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* gallium: add PIPE_CAP_PREFER_COMPUTE_BLIT_FOR_MULTIMEDIAMarek Olšák2019-04-241-0/+1
|
* gallium: set PIPE_CAP_MAX_FRAMES_IN_FLIGHT to 2 for all driversMarek Olšák2019-04-241-3/+0
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* gallium: replace DRM_CONF_THROTTLE with PIPE_CAP_MAX_FRAMES_IN_FLIGHTMarek Olšák2019-04-231-0/+3
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* st/mesa/radeonsi: fix race between destruction of types and shader compilationTimothy Arceri2019-04-241-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | Commit 624789e3708c moved the destruction of types out of atexit() and made use of a ref count instead. This is useful for avoiding a crash where drivers such as radeonsi are still compiling in a thread when the app exits and has not called MakeCurrent to change from the current context. While the above scenario is technically an app bug we shouldn't crash. However that change caused another race condition between the shader compilation tread in radeonsi and context teardown functions. This patch makes two changes to fix this new problem: First we explicitly call _mesa_destroy_shader_compiler_types() when destroying the st context rather than calling it indirectly via _mesa_free_context_data(). We do this as we must call it after st_destroy_context_priv() so that we don't destory the glsl types before the compilation threads finish. Next wait for the shader threads to finish in si_destroy_context() this also means we need to call context destroy before destroying the queues in si_destroy_screen(). Fixes: 624789e3708c ("compiler/glsl: handle case where we have multiple users for types") Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: delay adding BOs at the beginning of IBs until the first drawMarek Olšák2019-04-236-9/+46
| | | | | | | so that bound compute shader resources won't be added when they are not needed and same for graphics. Acked-by: Nicolai Hähnle <[email protected]>
* radeonsi: add helper si_get_minimum_num_gfx_cs_dwordsMarek Olšák2019-04-232-7/+12
| | | | | Tested-by: Dieter Nützel <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* radeonsi: add si_cp_copy_dataMarek Olšák2019-04-235-41/+44
| | | | | Tested-by: Dieter Nützel <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* ac: add radeon_info::marketing_name, replacing the winsys callbackMarek Olšák2019-04-231-12/+3
| | | | | Tested-by: Dieter Nützel <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* tgsi/scan: add uses_drawidMarek Olšák2019-04-231-0/+3
| | | | | Tested-by: Dieter Nützel <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* radeonsi: use CP DMA for the null const buffer clear on CIKMarek Olšák2019-04-225-10/+16
| | | | | | | | | | This is a workaround for a thread deadlock that I have no idea why it occurs. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108879 Fixes: 9b331e462e5021d994859756d46cd2519d9c9c6e Acked-by: Samuel Pitoiset <[email protected]>
* radeonsi/gfx9: use the correct condition for the DPBB + QUANT_MODE workaroundMarek Olšák2019-04-181-4/+4
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* radeonsi/nir: fix scanning of bindless imagesTimothy Arceri2019-04-171-38/+37
| | | | Fixes: d62d434fe920 ("ac/nir_to_llvm: add image bindless support")
* nir: optimize gl_SampleMaskIn to gl_HelperInvocation for radeonsi when possibleMarek Olšák2019-04-161-0/+1
| | | | Acked-by: Timothy Arceri <[email protected]>
* Delete autotoolsDylan Baker2019-04-152-80/+0
| | | | | | | | | | Acked-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Acked-by: Marek Olšák <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Matt Turner <[email protected]>
* radeonsi: enable GL_EXT_shader_image_load_formattedMarek Olšák2019-04-151-0/+1
| | | | | | no changes - the driver doesn't use the format Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi: set AC_FUNC_ATTR_READNONE for image opcodes where it was missingMarek Olšák2019-04-121-0/+4
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: add ac_build_load_helper_invocation() helperSamuel Pitoiset2019-04-121-6/+1
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: add ac_build_ddxy_interp() helperSamuel Pitoiset2019-04-121-26/+1
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nir/radv: remove restrictions on opt_if_loop_last_continue()Timothy Arceri2019-04-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When I implemented opt_if_loop_last_continue() I had restricted this pass from moving other if-statements inside the branch opposite the continue. At the time it was causing a bunch of spilling in shader-db for i965. However Samuel Pitoiset noticed that making this pass more aggressive significantly improved the performance of Doom on RADV. Below are the statistics he gathered. 28717 shaders in 14931 tests Totals: SGPRS: 1267317 -> 1267549 (0.02 %) VGPRS: 896876 -> 895920 (-0.11 %) Spilled SGPRs: 24701 -> 26367 (6.74 %) Code Size: 48379452 -> 48507880 (0.27 %) bytes Max Waves: 241159 -> 241190 (0.01 %) Totals from affected shaders: SGPRS: 23584 -> 23816 (0.98 %) VGPRS: 25908 -> 24952 (-3.69 %) Spilled SGPRs: 503 -> 2169 (331.21 %) Code Size: 2471392 -> 2599820 (5.20 %) bytes Max Waves: 586 -> 617 (5.29 %) The codesize increases is related to Wolfenstein II it seems largely due to an increase in phis rather than the existing jumps. This gives +10% FPS with Doom on my Vega56. Rhys Perry also benchmarked Doom on his VEGA64: Before: 72.53 FPS After: 80.77 FPS v2: disable pass on non-AMD drivers Reviewed-by: Ian Romanick <[email protected]> (v1) Acked-by: Samuel Pitoiset <[email protected]>
* radeonsi: fix a crash when unbinding sampler statesMarek Olšák2019-04-081-1/+1
| | | | Acked-by: James Zhu <[email protected]>
* radeonsi: set exact shader buffer read/write usage in CSMarek Olšák2019-04-046-24/+41
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* gallium: add writable_bitmask parameter into set_shader_buffersMarek Olšák2019-04-044-7/+10
| | | | | | | to indicate write usage per buffer. This is just a hint (it will be used by radeonsi). Reviewed-by: Timothy Arceri <[email protected]>
* simplify LLVM version string printingEric Engestrom2019-04-041-5/+2
| | | | | | | Figure it out once in the build system, then just use that all over the place. Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: add support for displayable DCC for multi-RB chipsMarek Olšák2019-04-047-4/+253
| | | | A compute shader is used to reorder DCC data from aligned to unaligned.
* radeonsi: add support for displayable DCC for 1 RB chipsMarek Olšák2019-04-041-4/+70
| | | | This is the simpler codepath - just disable RB and pipe alignment for DCC.
* radeonsi: add ability to bind images as image buffersMarek Olšák2019-04-042-3/+8
| | | | so that we can bind DCC (texture) as an image buffer.
* radeonsi/gfx9: add support for PIPE_ALIGNED=0Marek Olšák2019-04-044-12/+30
| | | | | | Needed by displayable DCC. We need to flush L2 after rendering if PIPE_ALIGNED=0 and DCC is enabled.
* radeonsi: don't use PFP_SYNC_ME with compute-only contextsMarek Olšák2019-04-021-1/+1
| | | | | | | | | | Compute rings don't have PFP. Fixes: a1378639ab1 "radeonsi: always use compute rings for clover on CI and newer (v2)" Reviewed-by: Samuel Pitoiset <[email protected]> Tested-by: Jan Vesely <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* radeonsi: implement ARB/KHR_parallel_shader_compile callbacksMarek Olšák2019-04-011-0/+31
|
* radeonsi: fix assertion failure by using the correct typeMarek Olšák2019-04-011-1/+1
| | | | | | | | | | src/gallium/drivers/radeonsi/si_state_viewport.c:196: si_emit_guardband: Assertion `vp_as_scissor.maxx <= max_viewport_size[vp_as_scissor.quant_mode] && vp_as_scissor.maxy <= max_viewport_size[vp_as_scissor.quant_mode]' failed. The comparison was unsigned, so negative maxx or maxy would fail. Fixes: 3c540e0a7488 "radeonsi: Fix guardband computation for large render targets"