summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* llvmpipe, draw: increase shader cache limitsRoland Scheidegger2017-09-072-4/+2
| | | | | | | | | | | | | | | We're not particularly concerned with memory usage, if the tradeoff is shader recompiles. And it's common for apps to have a lot of shaders nowadays (and, since our shaders include a LOT of context state of course we may create quite a bit more shaders even). So quadruple the amount of shaders draw will cache (from 128 to 512). For llvmpipe (fs shaders) quadruple the number of instructions, keep the number of variants the same for now (only with very simple, non-texturing shaders the variant limit could really be reached), and simplify the definition, it's probably easier to just have one different definition per branch... Reviewed-by: Jose Fonseca <[email protected]>
* radeon/uvd: fix the assertion check for YUYV formatLeo Liu2017-09-061-3/+5
| | | | | | | Fixes:7319ff87("radeon/uvd: add YUYV format support for target buffer") Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>
* swr/rast: FE/Clipper - unify SIMD8/16 functions using simdlib typesTim Rowley2017-09-063-1189/+446
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Remove use of C++14 template variableTim Rowley2017-09-062-6/+14
| | | | | | SWR rasterizer must remain C++11 compliant. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: SIMD16 FE remove templated immediates workaroundTim Rowley2017-09-061-90/+20
| | | | | | Fixed properly in gcc-compatible fashion. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: SIMD16 PA - rename Assemble_simd16 to AssembleTim Rowley2017-09-063-31/+15
| | | | | | For consistency and to support overloading. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: FE/Binner - unify SIMD8/16 functions using simdlib typesTim Rowley2017-09-065-1739/+696
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Removed some trailing whitespace caught during reviewTim Rowley2017-09-063-10/+10
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: set caps for VB 4-byte alignmentTim Rowley2017-09-061-3/+6
| | | | | | | | | | Needed to compensate for change to fetch jit requiring alignment. Fixes regressions in piglit: vertex-buffer-offsets and about another hundred of the vs-input*byte* tests. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Allow gather of floats from fetch shader with 2-4GB offsetsTim Rowley2017-09-062-1/+7
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* radeonsi/gfx9: proper workaround for LS/HS VGPR initialization bugNicolai Hähnle2017-09-065-24/+85
| | | | | | | | | | | | | | | | | | | When the HS wave is empty, the hardware writes the LS VGPRs starting at v0 instead of v2. Workaround by shifting them back into place when necessary. For simplicity, this is always done in the LS prolog. According to the hardware team, this will be fixed in future chips, so take that into account already. Note that this is not a bug fix, as the bug was already worked around by commit 166823bfd26 ("radeonsi/gfx9: add a temporary workaround for a tessellation driver bug"). This change merely replaces the workaround by one that should be better. v2: add workaround code to shader only when necessary v3: clarify the prefer_mono comment Reviewed-by: Marek Olšák <[email protected]>
* amd/common: pass chip_class to ac_dump_regNicolai Hähnle2017-09-061-15/+30
| | | | Acked-by: Marek Olšák <[email protected]>
* radeonsi/gfx9: always flush DB metadata on framebuffer changesNicolai Hähnle2017-09-063-4/+14
| | | | | | | This fixes GL45-CTS.shader_image_load_store.basic-glsl-earlyFragTests. Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]>
* svga: move index buffer bind flag assertionCharmaine Lee2017-09-051-3/+3
| | | | | | | | | | | The buffer bind flags can be promoted in svga_buffer_handle(), so move the assertion after it. This has already been done for vertex buffer in commit 6b4bf7e8be, but it misses the one for index buffer. Fixes assertion running WarThunder. Reviewed-by: Neha Bhende <[email protected]>
* svga: avoid emitting redundant SetShaderResources and SetVertexBuffersCharmaine Lee2017-09-052-18/+116
| | | | | | | | | | | | | Minor performance improvement in avoiding binding the same shader resource or the same vertex buffer for the same slot. Tested with MTT glretrace. v2: Per Brian's suggestion, add a helper function to do vertex buffer comparision. v3: Change the helper function to vertex_buffers_equal(). Reviewed-by: Brian Paul <[email protected]>
* radeonsi/gfx9: implement primitive binningMarek Olšák2017-09-0510-7/+489
| | | | | | | This increases performance, but it was tuned for Raven, not Vega. We don't know yet how Vega will perform, hopefully not worse. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add more state flags into si_state_dsaMarek Olšák2017-09-052-1/+23
| | | | | | | 3 flags for primitive binning, 2 flags for out-of-order rasterization (but that will be done some other time) Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: don't use BREAK_BATCH and FLUSH_DFSM if DFSM is disabledMarek Olšák2017-09-052-3/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: eliminate PS color outputs when colormask kills themMarek Olšák2017-09-043-0/+6
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: sort DBG shader flags according to pipe_shader_typeMarek Olšák2017-09-044-35/+17
| | | | | Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: ensure cache flushes happen before SET_PREDICATION packetsNicolai Hähnle2017-09-043-9/+18
| | | | | | | | The data is read when the render_cond_atom is emitted, so we must delay emitting the atom until after the flush. Fixes: 0fe0320dc074 ("radeonsi: use optimal packet order when doing a pipeline sync") Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: fix ARB_transform_feedback_overflow_query on <= VINicolai Hähnle2017-09-043-1/+12
| | | | | | | | The result written by the shader workaround needs to be written back, or the CP may read stale data. Fixes: 78476cfe071a ("radeonsi: enable ARB_transform_feedback_overflow_query") Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: fix compute shader state dumpingNicolai Hähnle2017-09-041-6/+11
| | | | | Fixes: 420c438589c8 ("radeonsi: log draw and compute state into log context") Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: add an assertion that only two-dimensional constant references are ↵Nicolai Hähnle2017-09-041-2/+3
| | | | | | | | | | used v2: remove some redundant checks Acked-by: Roland Scheidegger <[email protected]> (v1) Tested-by: Dieter Nützel <[email protected]> (v1) Reviewed-by: Timothy Arceri <[email protected]>
* gallium/radeon: always use two-dimensional constant referencesNicolai Hähnle2017-09-041-18/+18
| | | | | | Acked-by: Roland Scheidegger <[email protected]> Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* gallium/tests: always use two-dimensional constant referencesNicolai Hähnle2017-09-043-10/+10
| | | | | | Acked-by: Roland Scheidegger <[email protected]> Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* pp: always use two-dimensional constant referencesNicolai Hähnle2017-09-041-10/+10
| | | | | | Acked-by: Roland Scheidegger <[email protected]> Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* gallium/hud: always use two-dimensional constant referencesNicolai Hähnle2017-09-041-4/+4
| | | | | | Acked-by: Roland Scheidegger <[email protected]> Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* nine: always generate two-dimensional constant file accessesNicolai Hähnle2017-09-042-7/+5
| | | | | | Acked-by: Roland Scheidegger <[email protected]> Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* tgsi/build: always generate two-dimensional constant file accessesNicolai Hähnle2017-09-042-31/+45
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* tgsi/ureg: always emit constants (and their decls) as 2DNicolai Hähnle2017-09-041-15/+7
| | | | | | Acked-by: Roland Scheidegger <[email protected]> Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* gallium: all drivers should accept two-dimensional constant buffer indexingNicolai Hähnle2017-09-042-9/+4
| | | | | | | | | Most older drivers seem to just ignore the Dimension setting, so virtually no changes should be needed. Acked-by: Roland Scheidegger <[email protected]> Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* radeon/uvd: add Define Restart Interval to MJPEG bitstream reconstructionLeo Liu2017-09-021-0/+11
| | | | | | It adds the capacity to decode MJPEG stream with DRI marker Signed-off-by: Leo Liu <[email protected]>
* radeon/uvd: fix MJPEG quantization table indexLeo Liu2017-09-021-1/+1
| | | | | | Fixes: 130d1f456b8 ("radeon/uvd: reconstruct MJPEG bitstream") Signed-off-by: Leo Liu <[email protected]>
* freedreno: skip batch-cache for compute shadersRob Clark2017-09-021-7/+1
| | | | | | | | It is kind of pointless for compute, and avoids issues with apps kicking off more than 32 compute shaders at once. Signed-off-by: Rob Clark <[email protected]> Cc: "17.2" <[email protected]>
* swr: Report format max_samples=1 to maintain support for "fake" msaa.Cherniak, Bruce2017-09-011-11/+11
| | | | | | | | | | | | | | | | | | | | Accompanying patch "st/mesa: only try to create 1x msaa surfaces for 'fake' msaa" requires driver to report max_samples=1 to enable "fake" msaa. Previously, 0 and 1 were treated equivalently in st_init_extensions() and either could enable "fake" msaa. This patch raises the swr default msaa_max_count from 0 to 1, so that swr_is_format_supported will report max_samples=1. Real msaa can still be enabled by exporting SWR_MSAA_MAX_COUNT with a pow2 value between 2 and 16. This patch is necessary to prevent an OpenSWR regression resulting from the st/mesa patch. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102038 Acked-by: Brian Paul <[email protected]> Reviewed-By: George Kyriazis <[email protected]>
* radeonsi: move si_vm_fault_occured() to AMD common codeSamuel Pitoiset2017-09-011-102/+4
| | | | | | | | For radv, in order to report VM faults when detected. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nvc0/ir: propagate immediates to CALL input MOVsTobias Klausmann2017-08-311-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On using builtin functions we have to move the input to registers $0 and $1, if one of the input value is an immediate, we fail to propagate the immediate: ... mov u32 $r477 0x00000003 (0) ... mov u32 $r0 %r473 (0) mov u32 $r1 $r477 (0) call abs BUILTIN:0 (0) mov u32 %r495 $r1 (0) ... With this patch the immediate is propagated, potentially causing the first MOV to be superfluous, which we'd remove in that case: ... mov u32 $r0 %r473 (0) mov u32 $r1 0x00000003 (0) call abs BUILTIN:0 (0) mov u32 %r495 $r1 (0) ... Shaderdb stats: total instructions in shared programs : 4893460 -> 4893324 (-0.00%) total gprs used in shared programs : 582972 -> 582881 (-0.02%) total local used in shared programs : 17960 -> 17960 (0.00%) local gpr inst bytes helped 0 91 112 112 hurt 0 0 0 0 v2: implement some changes proposed by imirkin, the manual deletion of the dead mov is necessary after ea22ac23e0 ("nvc0/ir: unlink values pre- and post-call to division function") as the potentially dead mov is unlinked properly, causing later passes to not notice the mov op at all and thus not cleaning it up. That makes up a big chunk of the regression the above commit caused. Keep the deletion of the op where it is, deleting it later unnecessarily blows up size of the change. Signed-off-by: Tobias Klausmann <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: write 0 to pipeline_statistics.cs_invocationsKarol Herbst2017-08-311-0/+1
| | | | | | | | | | | | | cs_invocations are currently unsupported, but leaving the field uninitialized is even worse. fixes on nvc0: * KHR-GL45.pipeline_statistics_query_tests_ARB.functional_default_qo_values * KHR-GL45.pipeline_statistics_query_tests_ARB.functional_non_rendering_commands_do_not_affect_queries Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* llvmpipe: lp_build_gather_elem_vec BE fix for 3x16 loadBen Crocker2017-09-011-2/+28
| | | | | | | | | | | | | | | | | | | | | | Fix loading of a 3x16 vector as a single 48-bit load on big-endian systems (PPC64, S390). Roland Scheidegger's commit e827d9175675aaa6cfc0b981e2a80685fb7b3a74 plus Ray Strode's patch reduce pre-Roland Piglit failures from ~4000 to ~2000. This patch fixes three of the four regressions observed by Ray: - draw-vertices - draw-vertices-half-float - draw-vertices-half-float_gles2 One regression remains: - draw-vertices-2101010 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100613 Cc: "17.2" "17.1" <[email protected]> Signed-off-by: Ben Crocker <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: correct channel shift logic on big endianRay Strode2017-09-011-1/+7
| | | | | | | | | | | | | | | | | | | lp_build_fetch_rgba_soa fetches a texel from a texture. Part of that process involves first gathering the element together from memory into a packed format, and then breaking out the individual color channels into separate, parallel arrays. The code fails to account for endianess when reading the packed values. This commit attempts to correct the problem by reversing the order the packed values are read on big endian systems. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100613 Cc: "17.2" "17.1" <[email protected]> Signed-off-by: Ray Strode <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* winsys/amdgpu: set AMDGPU_GEM_CREATE_VM_ALWAYS_VALID if possible v2Christian König2017-08-313-5/+27
| | | | | | | | | | | When the kernel supports it set the local flag and stop adding those BOs to the BO list. Can probably be optimized much more. v2: rename new flag to AMDGPU_GEM_CREATE_VM_ALWAYS_VALID Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: set a per-buffer flag that disables inter-process sharing (v4)Marek Olšák2017-08-314-28/+56
| | | | | | | | | | | For lower overhead in the CS ioctl. Winsys allocators are not used with interprocess-sharable resources. v2: It shouldn't crash anymore, but the kernel will reject the new flag. v3 (christian): Rename the flag, avoid sending those buffers in the BO list. v4 (christian): Remove setting the kernel flag for now Reviewed-by: Marek Olšák <[email protected]>
* svga: include sample count in surface_size() computationBrian Paul2017-08-301-1/+1
| | | | | | | Use MAX2() because sampleCount will be zero for non-MSAA surfaces. No Piglit regressions. Reviewed-by: Charmaine Lee <[email protected]>
* winsys/amdgpu: add BO to the global list only when RADEON_ALL_BOS is setSamuel Pitoiset2017-08-304-11/+17
| | | | | | | Only useful when that debug option is enabled. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: update dirty_level_mask before dispatchingSamuel Pitoiset2017-08-302-0/+6
| | | | | | | | | This fixes a rendering issue with Hitman when bindless textures are enabled. Fixes: 2263610827 ("radeonsi: flush DB caches only when transitioning from DB to texturing") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* llvmpipe: initialize llvmpipe->dirty with LP_NEW_SCISSORBrian Paul2017-08-291-0/+6
| | | | | | | | | | | | | | If llvmpipe_set_scissor_states() is never called, we still need to be sure that derived scissor/clip state is updated. As of commit 743ad599a97d09b1 that function might not be called. Fixes regressed Piglit gl-1.0-scissor-offscreen -fbo -auto test. Reviewed-by: Roland Scheidegger <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101709 Fixes: 743ad599a97 ("st/mesa: don't set 16 scissors and 16 viewports if they're unused") Cc: "17.2" <[email protected]>
* ac/debug: Support multiple trace ids for nested IBs.Bas Nieuwenhuizen2017-08-291-9/+10
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radeonsi: stop leaking nirTimothy Arceri2017-08-291-0/+1
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: rewrite late alloc VS limit computationMarek Olšák2017-08-281-12/+25
| | | | | | This is still very simple, but it's better than before. Loosely ported from Vulkan.