| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Mostly test code, plus one spot I noticed in r600.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
We will only hit this with multi-planar YUV external images, so we would
probably never hit this code path in the first place. But if we did, it
wouldn't do the right thing so just bail.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
If enabled clip-planes have changed, we need to mark program state
dirty.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We have to be careful to not smash the value they're clearing to, but
other than that we're fine. Avoids quad clears in Processing, which likes
to do glClear(Z|S); glClear(Z).
Improves performance of Processing's QuadRendering demo at 5000 quads by
5.46507% +/- 1.35576% (n=15 before, 32 after)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were incrementing the count at the end of vc4_start_draw(), except that
that function returns immediately if we've already started drawing on this
batch. It also failed to count the statechanges from the GFXH-515
workaround.
This incidentally allows repeated glClear() to be coalesced, because the
fast clears aren't counted in draw_calls_queued any more. Fixes most of
the extra flushes in Processing, which emits glClear(Z|S); glClear(Z);
glClear(C) during its frame setup.
Improves performance of Processing's QuadRendering demo at 5000 quads by
3.33538% +/- 2.05846% (n=21 before, 15 after)
|
| |
|
|
|
|
|
| |
The fix in the vc4-jobs series ended up triggering the fallback path on
GLX apps that use depth but not stencil.
|
|
|
|
|
| |
I was curious if my Z/S buffer was actually ZS or ZX, and the vc4 format
of "0" didn't tell me much.
|
| |
|
|
|
|
|
|
|
| |
This slightly reduces instructions on shader-db, but I think it's just
perturbing register allocation -- the allocator should have always
trivially colored these nodes, before. This commit is just to make QIR
code failing more intelligible when register allocation fails.
|
|
|
|
|
|
|
|
|
| |
If a conditional assignment is only conditioned on the exec mask, that's
still screening off the value in the executed channels (and, since we're
not storing to the unexcuted channels, we don't care what's in there).
Fixes a bunch of extra register pressure on Processing's Ribbons demo,
which is failing to allocate.
|
|
|
|
|
|
| |
We would assertion fail in setting up the simulator the second time
around. This at least postpones the assertion failure until we've closed
all of the first set of screens and started opening a new set.
|
|
|
|
|
| |
Fixes 100 piglit tests since the assertions were added to nir.h. What's
amazing is that these tests used to pass, even when casting garbage.
|
|
|
|
|
|
|
|
|
| |
Checking if MAD is supported is definitely wrong, and it's
more likely a typo I introduced few days ago which breaks
NV50 because SHLADD is not supported there.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
When the chipset is forced with NV50_PROG_CHIPSET, we actually
only want to output the binary if NV50_PROG_DEBUG is also
enabled. Otherwise, this pollutes the shader-db output.
Signed-off-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Only expose 512 threads/block on Fermi to not be limited by
32 GPRs/thread.
v4: - use 512 threads on Fermi, 1024 on Kepler+
Signed-off-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a variable local size is defined as specified by
ARB_compute_variable_group_size, the fixed local size is set to 0
and a SIGFPE occurs when we compute the maximum number of regs.
This allows to use 64 GPRs/thread.
v4: - use 512 threads on Fermi, 1024 on Kepler+
Signed-off-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v3: - use a new case statement in r600_pipe_common.c
- fix compilation of softpipe...
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
helped some ue4 demos and divinity OS shaders
total instructions in shared programs : 2818674 -> 2818606 (-0.00%)
total gprs used in shared programs : 379273 -> 379273 (0.00%)
total local used in shared programs : 9505 -> 9505 (0.00%)
total bytes used in shared programs : 25837792 -> 25837192 (-0.00%)
local gpr inst bytes
helped 0 0 33 33
hurt 0 0 0 0
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Pierre Moreau <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
One of NIR's invariants is that control flow lists always start and end
with blocks. There's no good reason why we should return a cf_node from
these functions since we know that it's always a block. Making it a block
lets us remove a bunch of code.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
| |
No longer required.
Signed-off-by: Steven Toth <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
Otherwise it won't be picked in the tarball and the build will fail.
Fixes: 0035f7f1365 ("svga: add guest statistic gathering interface")
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Similar to the other 'tests', enable assertions in xvmc_bench.
This silences the GCC warnings about unused-variable(s), makes the
program actually useful, as the XvMC API called. Atm the function
calls are omitted, since they're called within the assert.
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
| |
Currently, program binaries are only dumped at upload time, but
when the chipset has been forced via NV50_PROG_CHIPSET we might
want to show the generated code, especially with shaderdb.
Signed-off-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
| |
There are VM faults without this.
Cc: 12.0 <[email protected]>
Acked-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
which is returned for GL_MAX_TEXTURE_BUFFER_SIZE.
It doesn't have any other use at the moment.
Bigger allocations are not rejected.
This fixes GL45-CTS.texture_buffer.texture_buffer_max_size on Bonaire.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Not returning garbage in .zw seems pretty important.
This fixes:
GL45-CTS.shader_multisample_interpolation.render.interpolate_at_*_check.*
Cc: 11.2 12.0 <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
(v1 pushed, then reverted)
This fixes 9 randomly failing tests on radeonsi:
GL45-CTS.shader_multisample_interpolation.render.interpolate_at_centroid.*
v2: use input_interpolate[input] (correct) instead of
input_interpolate[index] (incorrect)
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
fixes a crash in the case simplify reports an error
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
| |
Check for device reset on flush. It would be nicer if the kernel just
reported this as an error on the submit ioctl (and similarly for fences),
but this will do for now.
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
v2: slab_alloc_st -> slab_alloc
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97894
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
v2: Use AVX2 gather for non aligned loads too.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
v2: Make sure that with num_lods > 1 and min_filter != mag_filter we
still enter the splitting path. So this case would still use 4-wide aos
path (as a side note, the 4-wide aos sampling path could actually be
improved quite a bit if we have avx2, by just doing the filtering with
256bit vectors).
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
| |
v2: pblendb -> pblendvb
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Specified by subclause 7.3.4
v2: get the loop optimized
Signed-off-by: Leo Liu <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
For reference picture sets, there are cases that rps will not always
be used. Once detect the unused flag from encoded bitstream, we should
not add this rps to any list, otherwise pass the incorrect reference
and skip the correct rps.
Signed-off-by: Leo Liu <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
It will fix the corruption for frame, that only has one stort term ref
picture set, we set NULL rps for this case previously, causing taking
incorrect reference. Instead we should take that only short term set
as reference
Signed-off-by: Leo Liu <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The video size from format container is not always compatible with
the size from codec bitstream, the HW decoder should take the size
information from bitstream, otherwise the corruption appears with clip
that has different size info between bitstream and format container
So we are passing width(height)_in_samples from sequence parameter
set to video decoder.
Signed-off-by: Leo Liu <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
| |
For clip with frame delta poc over 16
Signed-off-by: Leo Liu <[email protected]>
|
|
|
|
|
|
|
|
| |
This is enabled automatically if shader printing is enabled, or separately
by R600_DEBUG=checkir. Catch mal-formed IR before it crashes in a later
pass.
Reviewed-by: Marek Olšák <[email protected]>
|