| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The SAMPLEMASK semantic should only return the bits set covered by the
current invocation. However we were always retrieving the covmask, which
returns the covered samples of the whole pixel.
When not doing per-sample invocation, this is precisely what we want.
However when doing per-sample invocation, we have to select the
sampleid'th bit and only return that. Furthermore, this means that we
have to have a 1:1 correlation for invocations and samples.
This fixes most
dEQP-GLES31.functional.shaders.sample_variables.sample_mask_in.*
tests. A few failures remain due to disagreements about nr_samples==1
logic as well as what happens with MSAA x2 RTs when the shading fraction
is 0.5.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
The i965 driver has its own pass for fusing mul+add combinations that's
much smarter than what nir_opt_algebraic can do so we don't want to get the
nir_opt_algebraic one just because we didn't set lower_ffma.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The goal is to allow the pipe driver to request something other than
TGSI, but detect whether what is getting is TGSI vs what it requested.
The pipe drivers will always have to support TGSI (and convert that into
whatever it is that they prefer), but in some cases we should be able to
skip the TGSI intermediate step (such as glsl->nir vs glsl->tgsi->nir).
I think pipe_compute_state should get similar treatment. Currently,
afaict, it has one user and one consumer, which has allowed it to be
sloppy wrt. supporting alternative IR's.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
The use of transfer_inline_write() in TexSubImage path (see fb9fe352ea4)
exposed a bug for "layer_first" resources (ie. a4xx) not setting correct
layer_stride.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Use GALLIVM_DEBUG=dumpbc for dumping of modules as bitcode.
Instead of a fixed llvmpipe.bc name, use ir_<modulename>.bc so multiple
modules can be dumped (albeit it might still overwrite previous modules,
particularly the modules from draw tend to always have the same name).
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes this build error.
CXX rasterizer/memory/libswrAVX_la-ClearTile.lo
In file included from rasterizer/memory/ClearTile.cpp:34:0:
./rasterizer/memory/Convert.h: In function ‘uint16_t Convert32To16Float(float)’:
./rasterizer/memory/Convert.h:170:9: error: ‘__builtin_isnan’ is not a member of ‘std’
if (std::isnan(val))
^
./rasterizer/memory/Convert.h:170:9: note: suggested alternative:
<built-in>: note: ‘__builtin_isnan’
./rasterizer/memory/Convert.h:176:14: error: ‘__builtin_isinf_sign’ is not a member of ‘std’
else if (std::isinf(val))
^
./rasterizer/memory/Convert.h:176:14: note: suggested alternative:
<built-in>: note: ‘__builtin_isinf_sign’
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95180
Signed-off-by: Vinson Lee <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The calculated limit gave problems on SI as it was > 32 KiB
and the hardware LDS size on SI is only 32 KiB. It isn't
correct anyway when processing multiple patches in a threadgroup.
As we potentially have any number of patches such that the
used LDS is at most the hardware LDS size, and exact size
per patch is not known at compile time, this seems like
the only valid bound.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We index into these based on var->data.driver_location, which might have
gaps (ie. two inputs, one w/ drvloc 0 and other 2). This shows up in
(for example) 'bin/copyteximage 1D', but was only noticed recently due
to additional asserts.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Compute support seems to be pretty stable now, and according to piglit
it doesn't seem to break 3D state.
As a side effect, this will expose ARB_compute_shader on GK110/GK208.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
This prevents IB rejections due to insane memory usage from
many concecutive texture uploads.
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This implements:
- Linear-to-linear partial copies. (unaligned)
- Tiled-to-linear and linear-to-tiled partial copies.
(unaligned except 1-2 Bpp)
- Tiled-to-tiled partial copies aligned to 8x8.
v2: Extend the SDMA L2T VM fault workaround to T2L.
- Same algorithm, just applied to T2L.
(and using a 0-based address and surface.bo_size instead of buf->size)
Reviewed-by: Alex Deucher <[email protected]> (v1)
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Most of this has never worked according to the new test.
The new code will be radically different.
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
just normalizing the interfaces
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
v2: - adjustments for exercising all important SDMA code paths
- decrease the probability of getting huge sizes (faster testing)
- increase the probability of getting power-of-two dimensions
- change the memory cap to 128MB (faster testing)
- better detect which engine has been used
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
v2: simplify the conditionals
Reviewed-by: Alex Deucher <[email protected]> (v1)
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
this is more robust and probably fixes some bugs already
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
because it doesn't decompress
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
| |
a staging cube texture with array_size % 6 != 0 doesn't work very well
just use 2D_ARRAY or 2D for all staging textures
Cc: 11.1 11.2 <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
It's for the buffer cache.
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
Those aren't really interesting, however outputting them is helpful when
trying to feed the IR to llvm llc (or opt) for debugging.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
| |
At least with MCJIT the disassembler will crash otherwise when trying to
disassemble such functions.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't target this yet, and some llvm versions incorrectly enable it based
on cpu string, causing crashes.
(Albeit this is a losing battle, it is pretty much guaranteed when the next
new feature comes along llvm will mistakenly enable it on some future cpu,
thus we would have to proactively disable all new features as llvm adds them.)
This should fix https://bugs.freedesktop.org/show_bug.cgi?id=94291 (untested)
Tested-by: Timo Aaltonen <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]
CC: <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Lower lrp when operating with double operands because float version of
lrp is also lowered.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
We don't need them for compute shaders.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
Since this is potentially modifying the block structure of the shader,
it needs the _safe() version of the iterator.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We request more than 32KB of LDS here, which SI doesn't have. Since LLVM
recently started checking the size of declared LDS allocations, all shaders
involved in tesselation fail to compile on SI.
Note that the entire calculation here seems wrong, given how we calculate
indices for generic attributes, so the number ends up wrong on CI+ as well.
A proper solution is clearly needed, but this patch should serve as a band-aid
for SI in the meantime.
Also note that the real size of the LDS allocation in hardware is independent
from what we tell LLVM, so this is really more of a "cosmetic" change.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95198
Cc: "11.2" <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
Experiments with framebuffer-no-attachments type draw calls have shown that
NULL exports stall terribly unless we ensure that export memory is allocated
by the SPI.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
This is useful for shader-related counters, since they tend to quickly
exceed 32 bits.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
LLVM when configured with "intel jitevents" enabled can inform
VTune about dynamic code, so individual shaders are attributed
profiling data and the resulting assembly can be examined.
Acked-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
Missed a switch break in query stat collection when refactoring queries.
Reviewed-by: George Kyriazis <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are a total of four possible currently, rather than 2. So we need
to be prepared for the input array to grow by 16 components. We could
get away with less if we could pack sysval inputs.. and the way this is
handled currently isn't really the nicest thing. But it's a tactical
fix for an issue hit in:
GL31-CTS.gtf30.GL3Tests.transform_feedback.transform_feedback_vertex_id
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
And set endian swap for packed formats the way it should be done
in theory.
This allows big endian to work again, but it can still be buggy.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71789
Cc: 11.1 11.2 <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
One of these is an unsigned bitfield, which I suspect is a false positive, but
gcc 5.3.1 complains about it with -fsanitize=undefined.
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Shifting into the sign bit of a signed int is undefined behavior.
Unfortunately, there are potentially many places where this happens using
the register macros.
This commit is the result of running
sed -ie "s/(((\(\w\+\)) & 0x\(\w\+\)) << \(\w\+\))/(((unsigned)(\1) \& 0x\2) << \3)/g"
on all header files in gallium/{r600,radeon,radeonsi}.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Funnily enough, some of these were turned into a compile-time error by gcc
with -fsanitize=undefined ("initializer is not a constant").
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
No sure where the 36 came from, but we clearly need at least
48 bytes per attribute per primitive.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
This will be used for resetting the uniform stream in the presence of
branching, but may also be useful as an optimization to reduce how many
uniforms we have to copy out per draw call (in exchange for increasing
icache pressure).
|
|
|
|
|
|
| |
Seeing the expansion of a QPU_GET_FIELD in an assert isn't very
informative, and it's hard find what's going wrong without getting a dump
of the instruction that failed.
|