summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/radeonsi
Commit message (Collapse)AuthorAgeFilesLines
* gallium: Add PIPE_SHADER_CAP_FP16Jan Vesely2017-09-181-0/+1
| | | | | | | | | Denotes native half precision float operations capability v2: PIPE_CAP_HALFS -> PIPE_SHADER_CAP_FP16 fix indentation Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: allow out-of-order rasterization in commutative blending casesNicolai Hähnle2017-09-185-4/+68
| | | | | | | | | | | | We do not enable this by default for additive blending, since it slightly breaks OpenGL invariance guarantees due to non-determinism. Still, there may be some applications can benefit from white-listing via the radeonsi_commutative_blend_add drirc setting without any real visible artifacts. Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* radeonsi: add drirc option "radeonsi_assume_no_z_fights"Nicolai Hähnle2017-09-184-4/+8
| | | | | | | | | | | | | | This option enables a performance optimization where typical non-blending draws with depth buffer may be rasterized out-of-order (on VI+, multi-SE chips). This optimization can lead to incorrect results when an applications renders multiple objects with the same Z value at the same pixel, so we will never enable it by default. But there may be applications that could benefit from white-listing. Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* radeonsi: enable out-of-order rasterization when possible on VI and GFX9 dGPUsNicolai Hähnle2017-09-185-5/+191
| | | | | | | | | This does not take commutative blending into account yet. R600_DEBUG=nooutoforder disables it. Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* gallium/radeon: pass old_(perfect_)enable to set_occlusion_query_stateNicolai Hähnle2017-09-181-1/+3
| | | | | | | The callee can derive the current enable state itself. Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* amd/common: remove has_ds_bpermute argument from ac_build_ddxyNicolai Hähnle2017-09-183-4/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* amd/common: add chip_class to ac_llvm_contextNicolai Hähnle2017-09-181-1/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* amd/common: round cube array slice in ac_prepare_cube_coordsNicolai Hähnle2017-09-181-0/+1
| | | | | | | | | | | | The NIR-to-LLVM pass already does this; now the same fix covers radeonsi as well. Fixes various tests of dEQP-GLES31.functional.texture.filtering.cube_array.combinations.* Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radeonsi: workaround for gather4 on integer cube mapsNicolai Hähnle2017-09-181-6/+100
| | | | | | | | | | This is the same workaround that radv already applied in commit 3ece76f03dc0 ("radv/ac: gather4 cube workaround integer"). Fixes dEQP-GLES31.functional.texture.gather.basic.cube.rgba8i/ui.* Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: enable STD430 packing of UBOs by defaultTimothy Arceri2017-09-151-1/+1
| | | | | | | Before this change we were defaulting to STD140 which is slightly less efficient at packing arrays. Reviewed-by: Marek Olšák <[email protected]>
* gallium: introduce PIPE_CAP_LOAD_CONSTBUFTimothy Arceri2017-09-151-0/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: make use of LOAD for UBOsTimothy Arceri2017-09-151-10/+21
| | | | | | v2: always set can_speculate and allow_smem to true Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: move si_get_wave_info() to AMD common codeSamuel Pitoiset2017-09-141-93/+3
| | | | | | | | This will allow us to use it from radv. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* gallium/{r600, radeonsi}: Fix segfault with color format (v2)Denis Pauk2017-09-141-1/+9
| | | | | | | | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102552 v2: Patch cleanup proposed by Nicolai Hähnle. * deleted changes in si_translate_texformat. Cc: Nicolai Hähnle <[email protected]> Cc: Ilia Mirkin <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* radeonsi: hard-code pixel center for interpolateAtSample without multisample ↵Nicolai Hähnle2017-09-133-1/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | buffers The GLSL rules for interpolateAtSample are unfortunate: "Returns the value of the input interpolant variable at the location of sample number sample. If multisample buffers are not available, the input variable will be evaluated at the center of the pixel. If sample sample does not exist, the position used to interpolate the input variable is undefined." This fix will fallback to monolithic shader compilation when interpolateAtSample is used without multisampling. One alternative would be to always upload 16 sample positions, filling the buffer up with repetition when the actual number of samples is less, and then ANDing the sample ID with 0xf. However, that punishes all well-behaving users of interpolateAtSample, when in reality, only conformance tests should be affected by the issue. Fixes dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.non_multisample_buffer.* Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: apply a mask to gl_SampleMaskIn in the PS prologNicolai Hähnle2017-09-133-5/+76
| | | | | | | | | | | | | gl_SampleMaskIn is supposed to contain set bits only for the samples that are covered by the current fragment shader invocation, but the VGPR initialization hardware loads the set of all bits that are covered at the current pixel. Fixes various tests in dEQP-GLES31.functional.shaders.sample_variables.sample_mask_in.* Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: rename variable to clarify its meaningNicolai Hähnle2017-09-131-10/+10
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: make si_init_shader_selector_async staticNicolai Hähnle2017-09-132-2/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: fix segfault in descriptor dumpingNicolai Hähnle2017-09-131-0/+18
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: optimize TCS epilog when invocation 0 writes tess factorsMarek Olšák2017-09-114-28/+89
| | | | | | | | | | This removes the barrier and LDS stores and loads for tess factors when it's possible. The removal of the barrier seems more important to me though. In one shader, it removes 17 * 4 bytes from the shader binary. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move the guts of ARB_shader_group_vote emission to acConnor Abbott2017-09-081-21/+3
| | | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radeonsi: move si_emit_ballot() to acConnor Abbott2017-09-081-32/+6
| | | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radeonsi: move emit_optimization_barrier() to acConnor Abbott2017-09-081-43/+2
| | | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radeonsi: move llvm_get_type_size() to acConnor Abbott2017-09-081-34/+9
| | | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* ac/surface: add radeon_surf::has_stencil for convenienceMarek Olšák2017-09-073-6/+6
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't read tcs_out_lds_layout.patch_stride from an SGPRMarek Olšák2017-09-071-6/+14
| | | | | | | Same as before, writing TCS outputs to LDS is rare. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't read tcs_out_lds_layout.vertex_size from an SGPRMarek Olšák2017-09-073-6/+20
| | | | | | | TCS outputs are usually not written to LDS, so no stats here. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: don't read LS out vertex stride from an SGPR in monolithic HSMarek Olšák2017-09-072-1/+11
| | | | | | | -44 bytes in a monolithic LS-HS binary. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't read the LS output vertex stride from an SGPR in LSMarek Olšák2017-09-071-4/+21
| | | | | | | | | Now it's able to generate ds_write2_b64 instead of ds_write2_b32. -20 bytes in one shader binary. (having only 1 output) Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't read the number of TCS out vertices from an SGPR in TCSMarek Olšák2017-09-071-2/+15
| | | | | | | -16 bytes in one shader binary. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't always apply the PrimID instancing bug workaround on SIMarek Olšák2017-09-071-1/+1
| | | | | | | | It looks like commit 391673af7ad1565a5f6ac8fc2f8c9fcdd1fe9908 that should have fixed the perf regression didn't really change much if anything. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove 2 callbacks from si_shader_contextMarek Olšák2017-09-073-17/+13
| | | | | Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: proper workaround for LS/HS VGPR initialization bugNicolai Hähnle2017-09-065-24/+85
| | | | | | | | | | | | | | | | | | | When the HS wave is empty, the hardware writes the LS VGPRs starting at v0 instead of v2. Workaround by shifting them back into place when necessary. For simplicity, this is always done in the LS prolog. According to the hardware team, this will be fixed in future chips, so take that into account already. Note that this is not a bug fix, as the bug was already worked around by commit 166823bfd26 ("radeonsi/gfx9: add a temporary workaround for a tessellation driver bug"). This change merely replaces the workaround by one that should be better. v2: add workaround code to shader only when necessary v3: clarify the prefer_mono comment Reviewed-by: Marek Olšák <[email protected]>
* amd/common: pass chip_class to ac_dump_regNicolai Hähnle2017-09-061-15/+30
| | | | Acked-by: Marek Olšák <[email protected]>
* radeonsi/gfx9: always flush DB metadata on framebuffer changesNicolai Hähnle2017-09-063-4/+14
| | | | | | | This fixes GL45-CTS.shader_image_load_store.basic-glsl-earlyFragTests. Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/gfx9: implement primitive binningMarek Olšák2017-09-058-7/+485
| | | | | | | This increases performance, but it was tuned for Raven, not Vega. We don't know yet how Vega will perform, hopefully not worse. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add more state flags into si_state_dsaMarek Olšák2017-09-052-1/+23
| | | | | | | 3 flags for primitive binning, 2 flags for out-of-order rasterization (but that will be done some other time) Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: don't use BREAK_BATCH and FLUSH_DFSM if DFSM is disabledMarek Olšák2017-09-052-3/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: eliminate PS color outputs when colormask kills themMarek Olšák2017-09-043-0/+6
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: sort DBG shader flags according to pipe_shader_typeMarek Olšák2017-09-041-3/+2
| | | | | Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: ensure cache flushes happen before SET_PREDICATION packetsNicolai Hähnle2017-09-041-5/+10
| | | | | | | | The data is read when the render_cond_atom is emitted, so we must delay emitting the atom until after the flush. Fixes: 0fe0320dc074 ("radeonsi: use optimal packet order when doing a pipeline sync") Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: fix ARB_transform_feedback_overflow_query on <= VINicolai Hähnle2017-09-041-1/+3
| | | | | | | | The result written by the shader workaround needs to be written back, or the CP may read stale data. Fixes: 78476cfe071a ("radeonsi: enable ARB_transform_feedback_overflow_query") Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: fix compute shader state dumpingNicolai Hähnle2017-09-041-6/+11
| | | | | Fixes: 420c438589c8 ("radeonsi: log draw and compute state into log context") Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: add an assertion that only two-dimensional constant references are ↵Nicolai Hähnle2017-09-041-2/+3
| | | | | | | | | | used v2: remove some redundant checks Acked-by: Roland Scheidegger <[email protected]> (v1) Tested-by: Dieter Nützel <[email protected]> (v1) Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi: move si_vm_fault_occured() to AMD common codeSamuel Pitoiset2017-09-011-102/+4
| | | | | | | | For radv, in order to report VM faults when detected. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi: update dirty_level_mask before dispatchingSamuel Pitoiset2017-08-301-0/+5
| | | | | | | | | This fixes a rendering issue with Hitman when bindless textures are enabled. Fixes: 2263610827 ("radeonsi: flush DB caches only when transitioning from DB to texturing") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac/debug: Support multiple trace ids for nested IBs.Bas Nieuwenhuizen2017-08-291-9/+10
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radeonsi: stop leaking nirTimothy Arceri2017-08-291-0/+1
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: rewrite late alloc VS limit computationMarek Olšák2017-08-281-12/+25
| | | | | | This is still very simple, but it's better than before. Loosely ported from Vulkan.
* radeonsi: correct maximum wave count per SIMDMarek Olšák2017-08-281-1/+12
| | | | | | v2: don't special-case Tonga and Iceland. Reviewed-by: Nicolai Hähnle <[email protected]>