summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/radeonsi
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: optimize TCS epilog when invocation 0 writes tess factorsMarek Olšák2017-09-114-28/+89
| | | | | | | | | | This removes the barrier and LDS stores and loads for tess factors when it's possible. The removal of the barrier seems more important to me though. In one shader, it removes 17 * 4 bytes from the shader binary. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move the guts of ARB_shader_group_vote emission to acConnor Abbott2017-09-081-21/+3
| | | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radeonsi: move si_emit_ballot() to acConnor Abbott2017-09-081-32/+6
| | | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radeonsi: move emit_optimization_barrier() to acConnor Abbott2017-09-081-43/+2
| | | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radeonsi: move llvm_get_type_size() to acConnor Abbott2017-09-081-34/+9
| | | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* ac/surface: add radeon_surf::has_stencil for convenienceMarek Olšák2017-09-073-6/+6
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't read tcs_out_lds_layout.patch_stride from an SGPRMarek Olšák2017-09-071-6/+14
| | | | | | | Same as before, writing TCS outputs to LDS is rare. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't read tcs_out_lds_layout.vertex_size from an SGPRMarek Olšák2017-09-073-6/+20
| | | | | | | TCS outputs are usually not written to LDS, so no stats here. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: don't read LS out vertex stride from an SGPR in monolithic HSMarek Olšák2017-09-072-1/+11
| | | | | | | -44 bytes in a monolithic LS-HS binary. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't read the LS output vertex stride from an SGPR in LSMarek Olšák2017-09-071-4/+21
| | | | | | | | | Now it's able to generate ds_write2_b64 instead of ds_write2_b32. -20 bytes in one shader binary. (having only 1 output) Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't read the number of TCS out vertices from an SGPR in TCSMarek Olšák2017-09-071-2/+15
| | | | | | | -16 bytes in one shader binary. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't always apply the PrimID instancing bug workaround on SIMarek Olšák2017-09-071-1/+1
| | | | | | | | It looks like commit 391673af7ad1565a5f6ac8fc2f8c9fcdd1fe9908 that should have fixed the perf regression didn't really change much if anything. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove 2 callbacks from si_shader_contextMarek Olšák2017-09-073-17/+13
| | | | | Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: proper workaround for LS/HS VGPR initialization bugNicolai Hähnle2017-09-065-24/+85
| | | | | | | | | | | | | | | | | | | When the HS wave is empty, the hardware writes the LS VGPRs starting at v0 instead of v2. Workaround by shifting them back into place when necessary. For simplicity, this is always done in the LS prolog. According to the hardware team, this will be fixed in future chips, so take that into account already. Note that this is not a bug fix, as the bug was already worked around by commit 166823bfd26 ("radeonsi/gfx9: add a temporary workaround for a tessellation driver bug"). This change merely replaces the workaround by one that should be better. v2: add workaround code to shader only when necessary v3: clarify the prefer_mono comment Reviewed-by: Marek Olšák <[email protected]>
* amd/common: pass chip_class to ac_dump_regNicolai Hähnle2017-09-061-15/+30
| | | | Acked-by: Marek Olšák <[email protected]>
* radeonsi/gfx9: always flush DB metadata on framebuffer changesNicolai Hähnle2017-09-063-4/+14
| | | | | | | This fixes GL45-CTS.shader_image_load_store.basic-glsl-earlyFragTests. Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/gfx9: implement primitive binningMarek Olšák2017-09-058-7/+485
| | | | | | | This increases performance, but it was tuned for Raven, not Vega. We don't know yet how Vega will perform, hopefully not worse. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add more state flags into si_state_dsaMarek Olšák2017-09-052-1/+23
| | | | | | | 3 flags for primitive binning, 2 flags for out-of-order rasterization (but that will be done some other time) Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: don't use BREAK_BATCH and FLUSH_DFSM if DFSM is disabledMarek Olšák2017-09-052-3/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: eliminate PS color outputs when colormask kills themMarek Olšák2017-09-043-0/+6
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: sort DBG shader flags according to pipe_shader_typeMarek Olšák2017-09-041-3/+2
| | | | | Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: ensure cache flushes happen before SET_PREDICATION packetsNicolai Hähnle2017-09-041-5/+10
| | | | | | | | The data is read when the render_cond_atom is emitted, so we must delay emitting the atom until after the flush. Fixes: 0fe0320dc074 ("radeonsi: use optimal packet order when doing a pipeline sync") Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: fix ARB_transform_feedback_overflow_query on <= VINicolai Hähnle2017-09-041-1/+3
| | | | | | | | The result written by the shader workaround needs to be written back, or the CP may read stale data. Fixes: 78476cfe071a ("radeonsi: enable ARB_transform_feedback_overflow_query") Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: fix compute shader state dumpingNicolai Hähnle2017-09-041-6/+11
| | | | | Fixes: 420c438589c8 ("radeonsi: log draw and compute state into log context") Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: add an assertion that only two-dimensional constant references are ↵Nicolai Hähnle2017-09-041-2/+3
| | | | | | | | | | used v2: remove some redundant checks Acked-by: Roland Scheidegger <[email protected]> (v1) Tested-by: Dieter Nützel <[email protected]> (v1) Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi: move si_vm_fault_occured() to AMD common codeSamuel Pitoiset2017-09-011-102/+4
| | | | | | | | For radv, in order to report VM faults when detected. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi: update dirty_level_mask before dispatchingSamuel Pitoiset2017-08-301-0/+5
| | | | | | | | | This fixes a rendering issue with Hitman when bindless textures are enabled. Fixes: 2263610827 ("radeonsi: flush DB caches only when transitioning from DB to texturing") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac/debug: Support multiple trace ids for nested IBs.Bas Nieuwenhuizen2017-08-291-9/+10
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radeonsi: stop leaking nirTimothy Arceri2017-08-291-0/+1
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: rewrite late alloc VS limit computationMarek Olšák2017-08-281-12/+25
| | | | | | This is still very simple, but it's better than before. Loosely ported from Vulkan.
* radeonsi: correct maximum wave count per SIMDMarek Olšák2017-08-281-1/+12
| | | | | | v2: don't special-case Tonga and Iceland. Reviewed-by: Nicolai Hähnle <[email protected]>
* Revert "radeonsi: get the raster config from AMDGPU on SI"Marek Olšák2017-08-271-17/+0
| | | | | | | | | | | | This reverts commit fc99cb3c9edee3af773700cf7ebdc60dc02fcaba. "The performance went down from 64.7 to 51.4 fps in Valley and from 30.8 to 25.1 fps in Heaven on Radeon HD 7970. Other games seem to have also a 10-25% performance decrease." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102429 It looks like we can't use the raster config values from the kernel.
* radeonsi: set IF_THRESHOLD to 4Timothy Arceri2017-08-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In 74e39de9324d it was set to 3 and it was reported that 4 caused tesseract to start spilling VGPRs. This no longer seems to be the case. Totals: SGPRS: 2787844 -> 2787764 (-0.00 %) VGPRS: 1713121 -> 1712717 (-0.02 %) Spilled SGPRs: 7532 -> 7532 (0.00 %) Spilled VGPRs: 49 -> 33 (-32.65 %) Private memory VGPRs: 2060 -> 2060 (0.00 %) Scratch size: 2200 -> 2180 (-0.91 %) dwords per thread Code Size: 79265520 -> 79248360 (-0.02 %) bytes LDS: 436 -> 436 (0.00 %) blocks Max Waves: 670535 -> 670608 (0.01 %) Wait states: 0 -> 0 (0.00 %) Before: VGPR SPILLING APPS Shaders SpillVGPR PrivVGPR ScratchSize EffectsCaveDemo 301 0 256 264 ReflectionsSubwayDemo 264 0 256 264 VehicleGame 295 0 128 132 bioshock-infinite 1140 0 448 516 dirt-showdown 453 33 0 28 gang-beasts 364 0 500 496 kerbal-space-program 1228 0 472 480 tomb-raider-ultra 1199 16 0 20 After: VGPR SPILLING APPS Shaders SpillVGPR PrivVGPR ScratchSize EffectsCaveDemo 301 0 256 264 ReflectionsSubwayDemo 264 0 256 264 VehicleGame 295 0 128 132 bioshock-infinite 1140 0 448 516 dirt-showdown 453 33 0 28 gang-beasts 364 0 500 496 kerbal-space-program 1228 0 472 480 The only change in VGPR spills is the elimination of all spills in Tomb Raider at Ultra settings. Closer examination shows that the shaders go over the limit because they contain three expressions a mul, rcp and ubo load. The ubo load is actually used elsewhere and is therefore stored in a temp already in IR such as tgsi but glsl ir counts it agaist the if cost. Acked-by: Nicolai Hähnle <[email protected]> Acked-by: Marek Olšák <[email protected]>
* glsl: pass shader source keys to the disk cacheTimothy Arceri2017-08-251-1/+1
| | | | | | | We don't actually write them to disk here. That will happen in the following commit. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: get the raster config from AMDGPU on SIMarek Olšák2017-08-241-0/+17
| | | | | | Not sure yet if we wanna do this on CIK and VI too. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: clean up setting GRBM_GFX_INDEXMarek Olšák2017-08-241-19/+22
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move PA_SC_RASTER_CONFIG emission into a separate functionMarek Olšák2017-08-241-70/+73
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix wrong assertion in si_init_bindless_descriptors()Samuel Pitoiset2017-08-231-1/+1
| | | | | | Bad mistake, sorry. Signed-off-by: Samuel Pitoiset <[email protected]>
* radeonsi: update comment describing indices into sctx->descriptorsNicolai Hähnle2017-08-231-6/+5
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: do not assert when reserving bindless slot 0Samuel Pitoiset2017-08-231-1/+4
| | | | | | | | | When assertions were disabled, the compiler removed the call to util_idalloc_alloc() and the first allocated bindless slot was 0 which is invalid per the spec. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi: rename some bindless-related helper functionsSamuel Pitoiset2017-08-231-21/+21
| | | | | | | I think it makes more sense. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: minor cleanups in si_make_{texture,image}_handle_resident()Samuel Pitoiset2017-08-231-12/+12
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: emit VGT_REUSE_OFF in the right placeMarek Olšák2017-08-222-8/+9
| | | | | | | clip_regs aren't marked dirty when writes_viewport_index is changed. Cc: 17.2 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add support for TGSI opcodes DCEIL, DFLR, DROUND, DSSG, DTRUNCMarek Olšák2017-08-222-1/+15
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use a faster version of PK2HMarek Olšák2017-08-221-21/+8
| | | | | | | + 4 piglit regressions, but it's correct accorcing to the GL spec and performance is more important than piglit. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't decompress Z/S if there is no HTILEMarek Olšák2017-08-221-12/+15
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: add helpers for whether HTILE is enabledMarek Olšák2017-08-223-12/+11
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: don't flush L2 metadata for DB if not neededMarek Olšák2017-08-223-11/+26
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: don't flush L2 metadata for CB if not neededMarek Olšák2017-08-224-17/+38
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: don't flush TC L2 between rendering and texturing if not neededMarek Olšák2017-08-223-34/+47
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>