summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* virgl: delete commented out fprintf-callErik Faye-Lund2018-08-281-1/+0
| | | | | | | This is just debug-cruft left over. Let's just get rid of it. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-By: Gert Wollny <[email protected]>
* nv50/ir,nvc0: use constant buffers for compute when possible on Kepler+Rhys Perry2018-08-272-10/+36
| | | | | | | | | | | | | | | | | Gives a +7.79% increase in FPS with Hitman on lowest quality settings on my GTX 1060. total instructions in shared programs : 5787979 -> 5748677 (-0.68%) total gprs used in shared programs : 669901 -> 669373 (-0.08%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21068 -> 21064 (-0.02%) local shared gpr inst bytes helped 1 0 152 274 274 hurt 0 0 0 0 0 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* nv50/ir: optimize multiplication by 16-bit immediates into two xmadsRhys Perry2018-08-271-0/+10
| | | | | | | | | | | | | | | | Rather than the usual three that would be created. total instructions in shared programs : 5796385 -> 5786560 (-0.17%) total gprs used in shared programs : 670103 -> 669968 (-0.02%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21164 -> 21068 (-0.45%) local shared gpr inst bytes helped 1 0 64 1040 1040 hurt 0 0 27 0 0 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* nv50/ir: optimize near power-of-twos into shladdRhys Perry2018-08-271-0/+27
| | | | | | | | | | | | | | total instructions in shared programs : 5819319 -> 5796385 (-0.39%) total gprs used in shared programs : 670571 -> 670103 (-0.07%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21164 -> 21164 (0.00%) local shared gpr inst bytes helped 0 0 318 1758 1758 hurt 0 0 63 0 0 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* nv50/ir: move a * b -> a << log2(b) code into createMul()Rhys Perry2018-08-271-15/+30
| | | | | | | | | | | | | | | | | | | | With this commit, OP_MAD is handled on nv50 too. This commit is also useful for later commits. Also, instead of creating a shladd, it relies on LateAlgebraicOpt to create one. This simplifies the code and helps shader-db slightly overall. total instructions in shared programs : 5820882 -> 5819319 (-0.03%) total gprs used in shared programs : 670595 -> 670571 (-0.00%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21164 -> 21164 (0.00%) local shared gpr inst bytes helped 0 0 18 230 230 hurt 0 0 8 263 263 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* nv50/ir: optimize imul/imad to xmadsRhys Perry2018-08-272-1/+56
| | | | | | | | | | | | | | | | | | | This hits the shader-db numbers a good bit, though a few xmads is way faster than an imul or imad and the cost is mitigated by the next commit, which optimizes many multiplications by immediates into shorter and less register heavy instructions than the xmads. total instructions in shared programs : 5768871 -> 5820882 (0.90%) total gprs used in shared programs : 669919 -> 670595 (0.10%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21068 -> 21164 (0.46%) local shared gpr inst bytes helped 0 0 38 0 0 hurt 1 0 365 3076 3076 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* gm107/ir: add support for OP_XMAD on GM107+Rhys Perry2018-08-273-1/+71
| | | | | | | | v4: make the immediate field 16 bits v5: don't ever emit h1 flags for immediates Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* nv50/ir: add preliminary support for OP_XMADRhys Perry2018-08-277-5/+85
| | | | | | | | v4: remove uint16_t(...) v4: don't allow immediates outside [0,65535] in insnCanLoad() Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* gallium: Split out PIPE_CAP_TEXTURE_MIRROR_CLAMP_TO_EDGE.Kenneth Graunke2018-08-2418-2/+21
| | | | | | | | | | | | | Some hardware can do PIPE_TEX_WRAP_MIRROR_REPEAT but not PIPE_TEX_WRAP_MIRROR_CLAMP and PIPE_TEX_WRAP_MIRROR_CLAMP_TO_BORDER. Drivers for such hardware would like to advertise support for ARB_texture_mirror_clamp_to_edge but not EXT_texture_mirror_clamp. This commit adds a new PIPE_CAP_TEXTURE_MIRROR_CLAMP_TO_EDGE bit, changes the extension enable to be based on that, and enables it in all upstream drivers which supported PIPE_CAP_TEXTURE_MIRROR_CLAMP (so they continue supporting this mode).
* Revert "configure: allow building with python3"Emil Velikov2018-08-245-5/+5
| | | | | | | | | | | | | | This reverts commit ae7898dfdbe5c8dab7d11c71862353f1ae43feb0. Turns out the python scripts are _not_ fully python 3 compatible. As Ilia reported using get_xmlpool.py with LANG=C produces some weird output - see the link for details. Even though the issue was spotted with the autoconf build, it exposes a genuine problem with the script (and lack of lang handling of the meson build.) https://lists.freedesktop.org/archives/mesa-dev/2018-August/203508.html
* gallivm: don't use saturated unsigned add/sub intrinsics for llvm 8.0Roland Scheidegger2018-08-241-27/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These have been removed. Unfortunately auto-upgrade doesn't work for jit. (Worse, it seems we don't get a compilation error anymore when compiling the shader, rather llvm will just do a call to a null function in the jitted shaders making it difficult to detect when intrinsics vanish.) Luckily the signed ones are still there, I helped convincing llvm removing them is a bad idea for now, since while the unsigned ones have sort of agreed-upon simplest patterns to replace them with, this is not the case for the signed ones, and they require _significantly_ more complex patterns - to the point that the recognition is IMHO probably unlikely to ever work reliably in practice (due to other optimizations interfering). (Even for the relatively trivial unsigned patterns, llvm already added test cases where recognition doesn't work, unsaturated add followed by saturated add may produce atrocious code.) Nevertheless, it seems there's a serious quest to squash all cpu-specific intrinsics going on, so I'd expect patches to nuke them as well to resurface. Adapt the existing fallback code to match the simple patterns llvm uses and hope for the best. I've verified with lp_test_blend that it does produce the expected saturated assembly instructions. Though our cmp/select build helpers don't use boolean masks, but it doesn't seem to interfere with llvm's ability to recognize the pattern. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106231 Reviewed-by: Jose Fonseca <[email protected]>
* radeonsi: increase the maximum UBO size to 2 GBMarek Olšák2018-08-231-1/+1
| | | | | | | | | Same as the closed driver. This causes a failure in GL45-CTS.compute_shader.max, which has a trivial bug. Tested-by: Dieter Nützel <[email protected]>
* radeonsi: bump MAX_GS_INVOCATIONSMarek Olšák2018-08-232-3/+3
| | | | | | same as the closed driver Tested-by: Dieter Nützel <[email protected]>
* gallium: add PIPE_CAP_MAX_SHADER_BUFFER_SIZEMarek Olšák2018-08-2318-0/+37
| | | | Tested-by: Dieter Nützel <[email protected]>
* gallium: add PIPE_CAP_MAX_GS_INVOCATIONSMarek Olšák2018-08-2318-1/+38
| | | | Tested-by: Dieter Nützel <[email protected]>
* tgsi/ureg: don't call tgsi_sanity when it's too slowMarek Olšák2018-08-231-1/+12
| | | | Tested-by: Dieter Nützel <[email protected]>
* configure: allow building with python3Emil Velikov2018-08-235-5/+5
| | | | | | | | | | | | Pretty much all of the scripts are python2+3 compatible. Check and allow using python3, while adjusting the PYTHON2 refs. Note: - python3.4 is used as it's the earliest supported version - python3 chosen prior to python2 Signed-off-by: Emil Velikov <[email protected]> Acked-by: Eric Engestrom <[email protected]>
* meson: Run the install script with Python 3Mathieu Bridon2018-08-234-0/+4
| | | | | | | | The script was being run directly as an executable, and it has a Python 2 shebang. Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* llvmpipe: add cc clobber to inline asmGrazvydas Ignotas2018-08-231-1/+2
| | | | | | | The bsr instruction modifies flags, so that needs to be indicated to the compiler. No effect on generated code, but still needed for correctness. Reviewed-by: Roland Scheidegger <[email protected]>
* ac: fix WAITCNT flags for GFX9Marek Olšák2018-08-221-5/+0
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* ac,radeonsi: use ac_build_gather_values moreMarek Olšák2018-08-213-33/+17
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* ac,radeonsi: use ac_build_fmadMarek Olšák2018-08-211-12/+5
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* radeonsi: use ac_build_imadMarek Olšák2018-08-213-57/+29
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: add ac_build_s_barrierMarek Olšák2018-08-211-3/+1
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* radeonsi: print the shader stage name when printing LLVM IRMarek Olšák2018-08-211-1/+2
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* radeonsi: use is_merged shader in si_prolog_get_rw_buffersMarek Olšák2018-08-211-18/+14
| | | | | | needed to change the input type to si_shader_context Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: completely remove +auto-waitcnt-before-barrierMarek Olšák2018-08-211-1/+0
| | | | | | | it causes corruption on several different GPU generations. Cc: 18.2 <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* r600/eg: rework atomic counter emission with flushesDave Airlie2018-08-216-31/+54
| | | | | | | | | | | | | | | | | | | | With the current code, we didn't do the space checks prior to atomic counter setup emission, but we also didn't add atomic counters to the space check so we could get a flush later as well. These flushes would be bad, and lead to problems with parallel tests. We have to ensure the atomic counter copy in, draw emits and counter copy out are kept in the same command submission unit. This reworks the code to drop some useless masks, make the counting separate to the emits, and make the space checker handle atomic counter space. [airlied: want this in 18.2] Fixes: 06993e4ee (r600: add support for hw atomic counters. (v3))
* virgl: ARB_enhanced_layouts supportDave Airlie2018-08-224-3/+8
| | | | | | | We need to handle the gaps in the streamout bindings on the guest side and enable if it the host has the rest enabled. Reviewed-by: Jakob Bornecrantz <[email protected]>
* mesa: remove unused dri config option disable_shader_bit_encodingTimothy Arceri2018-08-214-5/+0
| | | | | | | This was added as a workaround for Heaven 3.0 but was later removed by 5ead448719f3 to allow Heaven 4.0 to work correctly. Reviewed-by: Kenneth Graunke <[email protected]>
* swr: bump minimum supported LLVM version to 6.0Juan A. Suarez Romero2018-08-202-3/+3
| | | | | | | | | | | | | | | | | RADV now requires LLVM 6.0 or greater, and thus we can't build dist tarball because swr requires LLVM 5.0. Let's bump required LLVM to 6.0 in swr too. v2: bump also in meson.build (Eric) Fixes: fd1121e839 ("amd: remove support for LLVM 5.0") Cc: Tim Rowley <[email protected]> Cc: Emil Velikov <[email protected]> Cc: Dylan Baker <[email protected]> Cc: Eric Engestrom <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* freedreno: fix context teardown raceRob Clark2018-08-204-8/+8
| | | | | | | | We could still have batches queued up to flush, so fd_context_destroy() (which will kill and sync on the flush_queue) before deleting buffers that might be referenced from fdN_gmem() from context of flush_queue. Signed-off-by: Rob Clark <[email protected]>
* gallium/winsys/kms: don't unmap what wasn't mappedRay Strode2018-08-171-5/+13
| | | | | | | | | | | | | | | | | | | | | | At the moment, depending on pipe transfer flags, the dumb buffer map address can end up at either kms_sw_dt->ro_mapped or kms_sw_dt->mapped. When it's time to unmap the dumb buffer, both locations get unmapped, even though one is probably initialized to 0. That leads to the code segment getting unmapped at runtime and crashes when trying to call into unrelated code. This commit addresses the problem by using MAP_FAILED instead of NULL for ro_mapped and mapped when the dumb buffer is unmapped, and only unmapping mapped addresses at unmap time. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107098 Signed-off-by: Ray Strode <[email protected]> Fixes: d891f28df9a ("gallium/winsys/kms: Fix possible leak in map/unmap.") Cc: Lepton Wu <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* xmlconfig: add kernel_driver device attributeQiang Yu2018-08-172-2/+2
| | | | | | | | | This attribute can be used by loader to apply different option to device use specific kernel driver. Signed-off-by: Qiang Yu <[email protected]> Acked-by: Michel Dänzer <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* freedreno/a6xx: streamoutRob Clark2018-08-173-45/+62
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: fragz fixesRob Clark2018-08-171-7/+3
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: scissor fixesRob Clark2018-08-172-4/+4
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2018-08-179-27/+32
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: fix srgbRob Clark2018-08-171-7/+13
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: fix dEQP-GLES3.functional.fence_sync.*Rob Clark2018-08-171-0/+4
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: Add a6xx backendKristian H. Kristensen2018-08-1638-17/+6368
| | | | | | | | | | This adds a freedreno backend for the a6xx generation GPUs, which at the time of this commit is about 98% GLES2 conformant. Much remains to be done - both performance work and feature work towards more recent GLES versions, but this is a good start. Signed-off-by: Kristian H. Kristensen <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2018-08-167-66/+4928
| | | | | | pull in a6xx registers Signed-off-by: Rob Clark <[email protected]>
* freedreno: Fix warningsKristian H. Kristensen2018-08-165-15/+9
| | | | | Signed-off-by: Kristian H. Kristensen <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* svga: simplify Mesa version stringEric Engestrom2018-08-161-1/+1
| | | | | | Suggested-by: Emil Velikov <[email protected]> Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* bin: always define MESA_GIT_SHA1 to make it directly usable in codeEric Engestrom2018-08-163-15/+3
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* virgl: report actual max-texture sizesErik Faye-Lund2018-08-152-0/+10
| | | | | | | | Instead of doing conservative guesses, we should report the max levels based on the max sizes we get from GL on the host. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Jakob Bornecrantz <[email protected]>
* virgl: do not use SP_MAX_TEXTURE_*_LEVELS definesErik Faye-Lund2018-08-151-7/+3
| | | | | | | | | | | | | These macro-names are also used for softpipe, so let's avoid confusion by avoiding them. Besides, they are just used in one place in virgl, so let's just inline them into the place they are used instead. While we're at it, fixup an error in the comment for the 3D version. Mesa subtracts computes max-size by doing by 2^(n-1), which means this should be 256 cubed, not 512 cubed. The other comments are correct. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Jakob Bornecrantz <[email protected]>
* radv: disable the auto-waitcnt-before-barrier LLVM optionSamuel Pitoiset2018-08-151-0/+1
| | | | | | | | | | | | | | This option allows us to remove additional s_waitcnt instructions because s_barrier internally does s_waitcnt 0. Though, apparently there is a problem with LDS accesses that causes rendering issues with FFXV and DXVK. Disable this optimization for now (RadeonSI still uses it). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107460 CC: 18.2 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi: enable 1 missing PS_SU perf counter on PolarisMarek Olšák2018-08-141-1/+1
|
* radeonsi: use radeon_info::nameMarek Olšák2018-08-143-40/+12
| | | | Reviewed-by: Timothy Arceri <[email protected]>