| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
This is just debug-cruft left over. Let's just get rid of it.
Signed-off-by: Erik Faye-Lund <[email protected]>
Reviewed-By: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gives a +7.79% increase in FPS with Hitman on lowest quality settings on
my GTX 1060.
total instructions in shared programs : 5787979 -> 5748677 (-0.68%)
total gprs used in shared programs : 669901 -> 669373 (-0.08%)
total shared used in shared programs : 548832 -> 548832 (0.00%)
total local used in shared programs : 21068 -> 21064 (-0.02%)
local shared gpr inst bytes
helped 1 0 152 274 274
hurt 0 0 0 0 0
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than the usual three that would be created.
total instructions in shared programs : 5796385 -> 5786560 (-0.17%)
total gprs used in shared programs : 670103 -> 669968 (-0.02%)
total shared used in shared programs : 548832 -> 548832 (0.00%)
total local used in shared programs : 21164 -> 21068 (-0.45%)
local shared gpr inst bytes
helped 1 0 64 1040 1040
hurt 0 0 27 0 0
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
total instructions in shared programs : 5819319 -> 5796385 (-0.39%)
total gprs used in shared programs : 670571 -> 670103 (-0.07%)
total shared used in shared programs : 548832 -> 548832 (0.00%)
total local used in shared programs : 21164 -> 21164 (0.00%)
local shared gpr inst bytes
helped 0 0 318 1758 1758
hurt 0 0 63 0 0
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this commit, OP_MAD is handled on nv50 too. This commit is also
useful for later commits.
Also, instead of creating a shladd, it relies on LateAlgebraicOpt to
create one. This simplifies the code and helps shader-db slightly overall.
total instructions in shared programs : 5820882 -> 5819319 (-0.03%)
total gprs used in shared programs : 670595 -> 670571 (-0.00%)
total shared used in shared programs : 548832 -> 548832 (0.00%)
total local used in shared programs : 21164 -> 21164 (0.00%)
local shared gpr inst bytes
helped 0 0 18 230 230
hurt 0 0 8 263 263
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This hits the shader-db numbers a good bit, though a few xmads is way
faster than an imul or imad and the cost is mitigated by the next commit,
which optimizes many multiplications by immediates into shorter and less
register heavy instructions than the xmads.
total instructions in shared programs : 5768871 -> 5820882 (0.90%)
total gprs used in shared programs : 669919 -> 670595 (0.10%)
total shared used in shared programs : 548832 -> 548832 (0.00%)
total local used in shared programs : 21068 -> 21164 (0.46%)
local shared gpr inst bytes
helped 0 0 38 0 0
hurt 1 0 365 3076 3076
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
| |
v4: make the immediate field 16 bits
v5: don't ever emit h1 flags for immediates
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
| |
v4: remove uint16_t(...)
v4: don't allow immediates outside [0,65535] in insnCanLoad()
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some hardware can do PIPE_TEX_WRAP_MIRROR_REPEAT but not
PIPE_TEX_WRAP_MIRROR_CLAMP and PIPE_TEX_WRAP_MIRROR_CLAMP_TO_BORDER.
Drivers for such hardware would like to advertise support for
ARB_texture_mirror_clamp_to_edge but not EXT_texture_mirror_clamp.
This commit adds a new PIPE_CAP_TEXTURE_MIRROR_CLAMP_TO_EDGE bit,
changes the extension enable to be based on that, and enables it
in all upstream drivers which supported PIPE_CAP_TEXTURE_MIRROR_CLAMP
(so they continue supporting this mode).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit ae7898dfdbe5c8dab7d11c71862353f1ae43feb0.
Turns out the python scripts are _not_ fully python 3 compatible.
As Ilia reported using get_xmlpool.py with LANG=C produces some weird
output - see the link for details.
Even though the issue was spotted with the autoconf build, it exposes a
genuine problem with the script (and lack of lang handling of the meson
build.)
https://lists.freedesktop.org/archives/mesa-dev/2018-August/203508.html
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These have been removed. Unfortunately auto-upgrade doesn't work for
jit. (Worse, it seems we don't get a compilation error anymore when
compiling the shader, rather llvm will just do a call to a null
function in the jitted shaders making it difficult to detect when
intrinsics vanish.)
Luckily the signed ones are still there, I helped convincing llvm
removing them is a bad idea for now, since while the unsigned ones have
sort of agreed-upon simplest patterns to replace them with, this is not
the case for the signed ones, and they require _significantly_ more
complex patterns - to the point that the recognition is IMHO probably
unlikely to ever work reliably in practice (due to other optimizations
interfering). (Even for the relatively trivial unsigned patterns, llvm
already added test cases where recognition doesn't work, unsaturated
add followed by saturated add may produce atrocious code.)
Nevertheless, it seems there's a serious quest to squash all
cpu-specific intrinsics going on, so I'd expect patches to nuke them as
well to resurface.
Adapt the existing fallback code to match the simple patterns llvm uses
and hope for the best. I've verified with lp_test_blend that it does
produce the expected saturated assembly instructions. Though our
cmp/select build helpers don't use boolean masks, but it doesn't seem
to interfere with llvm's ability to recognize the pattern.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106231
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Same as the closed driver.
This causes a failure in GL45-CTS.compute_shader.max, which has a trivial
bug.
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
| |
same as the closed driver
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Pretty much all of the scripts are python2+3 compatible.
Check and allow using python3, while adjusting the PYTHON2 refs.
Note:
- python3.4 is used as it's the earliest supported version
- python3 chosen prior to python2
Signed-off-by: Emil Velikov <[email protected]>
Acked-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
| |
The script was being run directly as an executable, and it has a
Python 2 shebang.
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
The bsr instruction modifies flags, so that needs to be indicated to the
compiler. No effect on generated code, but still needed for correctness.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
| |
needed to change the input type to si_shader_context
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
| |
it causes corruption on several different GPU generations.
Cc: 18.2 <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the current code, we didn't do the space checks prior
to atomic counter setup emission, but we also didn't add
atomic counters to the space check so we could get a flush
later as well.
These flushes would be bad, and lead to problems with
parallel tests. We have to ensure the atomic counter copy in,
draw emits and counter copy out are kept in the same command
submission unit.
This reworks the code to drop some useless masks, make the
counting separate to the emits, and make the space checker
handle atomic counter space.
[airlied: want this in 18.2]
Fixes: 06993e4ee (r600: add support for hw atomic counters. (v3))
|
|
|
|
|
|
|
| |
We need to handle the gaps in the streamout bindings on the guest
side and enable if it the host has the rest enabled.
Reviewed-by: Jakob Bornecrantz <[email protected]>
|
|
|
|
|
|
|
| |
This was added as a workaround for Heaven 3.0 but was later removed
by 5ead448719f3 to allow Heaven 4.0 to work correctly.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RADV now requires LLVM 6.0 or greater, and thus we can't build dist
tarball because swr requires LLVM 5.0.
Let's bump required LLVM to 6.0 in swr too.
v2: bump also in meson.build (Eric)
Fixes: fd1121e839 ("amd: remove support for LLVM 5.0")
Cc: Tim Rowley <[email protected]>
Cc: Emil Velikov <[email protected]>
Cc: Dylan Baker <[email protected]>
Cc: Eric Engestrom <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
| |
We could still have batches queued up to flush, so fd_context_destroy()
(which will kill and sync on the flush_queue) before deleting buffers
that might be referenced from fdN_gmem() from context of flush_queue.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At the moment, depending on pipe transfer flags, the dumb
buffer map address can end up at either kms_sw_dt->ro_mapped
or kms_sw_dt->mapped.
When it's time to unmap the dumb buffer, both locations get unmapped,
even though one is probably initialized to 0.
That leads to the code segment getting unmapped at runtime and
crashes when trying to call into unrelated code.
This commit addresses the problem by using MAP_FAILED instead of
NULL for ro_mapped and mapped when the dumb buffer is unmapped,
and only unmapping mapped addresses at unmap time.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107098
Signed-off-by: Ray Strode <[email protected]>
Fixes: d891f28df9a ("gallium/winsys/kms: Fix possible leak in map/unmap.")
Cc: Lepton Wu <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This attribute can be used by loader to apply different
option to device use specific kernel driver.
Signed-off-by: Qiang Yu <[email protected]>
Acked-by: Michel Dänzer <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This adds a freedreno backend for the a6xx generation GPUs, which at
the time of this commit is about 98% GLES2 conformant. Much remains to
be done - both performance work and feature work towards more recent
GLES versions, but this is a good start.
Signed-off-by: Kristian H. Kristensen <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
pull in a6xx registers
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Kristian H. Kristensen <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Suggested-by: Emil Velikov <[email protected]>
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
| |
Instead of doing conservative guesses, we should report the max levels
based on the max sizes we get from GL on the host.
Signed-off-by: Erik Faye-Lund <[email protected]>
Reviewed-by: Jakob Bornecrantz <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These macro-names are also used for softpipe, so let's avoid confusion
by avoiding them. Besides, they are just used in one place in virgl, so
let's just inline them into the place they are used instead.
While we're at it, fixup an error in the comment for the 3D version.
Mesa subtracts computes max-size by doing by 2^(n-1), which means this
should be 256 cubed, not 512 cubed. The other comments are correct.
Signed-off-by: Erik Faye-Lund <[email protected]>
Reviewed-by: Jakob Bornecrantz <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This option allows us to remove additional s_waitcnt instructions
because s_barrier internally does s_waitcnt 0.
Though, apparently there is a problem with LDS accesses that
causes rendering issues with FFXV and DXVK. Disable this
optimization for now (RadeonSI still uses it).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107460
CC: 18.2 <[email protected]>
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
| |
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
|