| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
Function arguments do not have an "origin" instruction, causing a
NULL-pointer dereference without this check.
Signed-off-by: Pierre Moreau <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
| |
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
| |
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
pipe_mutex_unlock() was made unnecessary with fd33a6bcd7f12.
Replaced using:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_unlock(\([^)]*\)):mtx_unlock(\&\1):g' {} \;
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
replace pipe_mutex_lock() was made unnecessary with fd33a6bcd7f12.
Replaced using:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_lock(\([^)]*\)):mtx_lock(\&\1):g' {} \;
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
pipe_mutex_destroy() was made unnecessary with fd33a6bcd7f12.
Replace was done with:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_destroy(\([^)]*\)):mtx_destroy(\&\1):g' {} \;
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
pipe_mutex_init() was made unnecessary with fd33a6bcd7f12.
Replace was done using:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_init(\([^)]*\)):(void) mtx_init(\&\1, mtx_plain):g' {} \;
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
pipe_mutex was made unnecessary with fd33a6bcd7f12.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See detailed explanation of why this is needed in commit eb60a89bc3a.
This spot was missed/overlooked. Basically as a result of the fact
that BEGIN_* ends up calling PUSH_SPACE, which in turn adds an extra 8
to the requested amount, we have to be mindful of that when doing bare
nouveau_pushbuf_space calls.
Reportedly this fixes some crashes when replaying a hitman trace taken
on radeonsi.
Fixes: eb60a89bc3a ("nouveau: take extra push space into account for pushbuf_space calls")
Cc: "13.0 17.0" <[email protected]>
Reported-by: Karol Herbst <[email protected]>
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
When binding as textures, the alignment can be 16. However when binding
as an image, the address has to be aligned to 256. (Also when binding as
an RT, but that can't happen with GL or current gallium APIs.)
Reported-by: Roy Spliet <[email protected]>
Signed-off-by: Ilia Mirkin <[email protected]>
Acked-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
| |
all drivers support it
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Tested-by: Brian Paul <[email protected]> (VMware driver only)
|
|
|
|
| |
Signed-off-by: Ben Skeggs <[email protected]>
|
|
|
|
|
|
|
| |
Not used and not widely supported. Use MIN+MAX instead.
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Notes:
- make sure the default size is large enough to handle all state trackers
- pipe wrappers don't receive transfer calls from stream_uploader, because
pipe_context::stream_uploader points directly to the underlying driver's
stream_uploader (to keep it simple for now)
v2: add error handling to nv50, nvc0, noop
v3: set const_uploader
Reviewed-by: Nicolai Hähnle <[email protected]>
Tested-by: Edmondo Tommasina <[email protected]> (v1)
Tested-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Empirically, this makes things work. Presumably this was originally
copied from the blob, which does make use of linked tsc mode.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99532
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
| |
The blob uses these, and it fixes a bunch of dEQP stencil sampling tests
involving border colors. Probably the Z-based samplers work somehow
differently wrt border colors when using the stencil swizzle.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
Fixes GL45-CTS.compute_shader.conditional-dispatching
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
| |
Fixes GL45-CTS.compute_shader.atomic-case1 on Maxwell
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
| |
On SM35 there does not appear to be a way to emit a ATOM.EXCH with a
null destination. This should be functionally equivalent to a plain
store however, so just do that.
Fixes GL45-CTS.compute_shader.atomic-case2 on SM35.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
The former logic just plain didn't work at all. We need to write the
subsequent dword to the next buffer location.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We have logic to short-circuit such retrievals to zero. However "zero"
was an immediate, and some logic expected to get registers (to later be
propagated). Fix this by using loadImm.
Fixes GL45-CTS.gpu_shader5.images_array_indexing
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
| |
Recently broken during int64 addition.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We just increased the max UBO, so we should also increase the clamp that
we do for robustness. Similarly, as we're including the fileIndex in the
new indirect value, we should reset fileIndex to 0 so that it is not
added in a second time.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Many many many compute shaders only define a 1- or 2-dimensional block,
but then continue to use system values that take the full 3d into
account (like gl_LocalInvocationIndex, etc). So for the special case
that a dimension is exactly 1, we know that the thread id along that
axis will always be 0, so return it as such and allow constant folding
to fix things up.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Pierre Moreau <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Kepler and up unfortunately only support up to 8 constbufs. We work
around this by loading from constbufs as if they were storage buffers.
However we were not consistently applying limits to loads from these
buffers. Make sure to do the same thing we do for storage buffers.
Fixes GL45-CTS.robust_buffer_access_behavior.uniform_buffer
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Apparently GL 4.5 requires 14 of these (there's a "*" in the spec, but
it's unclear what it refers to). We need to expose an extra binding
point for the "program parameters", which means this must be 15. Remove
the last vestige of the "use c14 for immediates" idea.
Fixes GL45-CTS.shading_language_420pack.binding_uniform_block_array
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
Cc: [email protected]
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
There's all kinds of logic that doesn't like there being holes in defs
or srcs lists. Avoid them. This also fixes the sched logic for maxwell.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
Unfortunately there is no SHF.L/SHF.R instruction pre-SM35. So we have
to do a bit more work to get the job done.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
A few thoughts:
- Some of that LegalizeSSA logic should really live much earlier and be
subject to the likes of DCE and other useful passes
- Some of the "lowering" done in from_tgsi should be done later so that
proper optimization might be done.
However this all works and the above can be improved upon later.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
Hardware does not support 64-bit integers MAD and MUL operations, so we need
to transform them in 32-bit operations.
Signed-off-by: Pierre Moreau <[email protected]>
|
|
|
|
|
|
| |
Note that this is not available for SM20/SM30.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
We were never emitting a .X flag for consuming condition code on SET,
and weren't emitting a signed type for SLCT comparison. Discovered while
working on int64 logic.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
These operations allow you to compute min/max on arbitrary-width
integers, 32 bits at a time.
Note that the low/med ops implicitly set the condition code, and the
med/high ops implicitly consume it.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Nouveau does not currently have logic to implement this as a library
function. Even though such a library could be written, there's no big
advantage to do it that way for now given that int64 is a very uncommon
use-case. Allow a driver to expose INT64 without supporting division and
modulo operations.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make the cap consistent with PIPE_CAP_INT64.
Aside from the hypothetical case of using draw for vertex shaders (and
actually caring about doubles...), every implementation supports doubles
either nowhere or everywhere.
Also, st/mesa didn't even check the cap correctly in all supported
shader stages.
While at it, add a missing LLVM version check for 64-bit integers in
radeonsi. This is conservative: judging by the log, LLVM 3.8 might be
sufficient, but there are probably bugs that have been fixed since then.
v2: fix clover (Marek)
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
Already handled by the build.
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v1.1: move to using a normal CAP. (Marek)
v2: fill in the cap everywhere
Signed-off-by: Dave Airlie <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
This is simply keyed off the vertex shader, as that's guaranteed to be
present in any pipeline.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
This sets the dnz flag on all the relevant multiplication operations. At
emission time, this will only be supported by nvc0+, so nv50 will need a
different solution.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Axel Davy <[email protected]>
|
|
|
|
|
|
|
|
|
| |
No point in having the extra argument considering that it's effectively
unused since the function was introduced.
Cc: Ilia Mirkin <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Address loading can often end up as shl + shr + shl combinations. The
latter two are equal shifts, which get converted into an and mask.
However if the previous shl is more than the mask is trying to remove
(in terms of low bits), we can just remove the and entirely. This
reduces some large shaders by as many as 3% of instructions (out of 2K).
total instructions in shared programs : 6495509 -> 6491076 (-0.07%)
total gprs used in shared programs : 954621 -> 954623 (0.00%)
local gpr inst bytes
helped 0 0 1014 1014
hurt 0 2 0 0
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't need to support all the color buffers for advanced blend, just
cb0. For Fermi, we use the special binding slots so that we don't
overlap with user textures, while Kepler+ gets a dedicated position for
the fb handle in the driver constbuf.
This logic is only triggered when a FBFETCH is actually present so it
should be a no-op most of the time.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
This is so that we can differentiate between flushing any framebuffer
reading caches from regular sampler caches.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
This had been updated in one place but not the other.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
| |
The existing lowering is in place to lower that to RCP + MUL, or fancier
things down the line if necessary.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|