| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
Added hooks for screen creation and swap. Still keep llvmpipe the default
software renderer.
v2: split from bigger patch
v3: reword commit message
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
| |
CID 1394028
Signed-off-by: Mun Gwan-gyeong <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For compute shaders, we free the selector after the shader has been
compiled, so we need to save this bit somewhere else. Also, make sure that
this type of bug cannot re-appear, by NULL-ing the selector pointer after
we're done with it.
This bug has been there since the feature was added, but was only exposed
in piglit arb_compute_variable_group_size-local-size by commit
9bfee7047b70cb0aa026ca9536465762f96cb2b1 (which is totally unrelated).
Cc: 13.0 <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since the state tracker now enables MSAA in the hardware for the case
nr_samples == 1 as well, we need to set sample locations correctly for
this case.
The Polaris override is still needed for the non-MSAA case (when
nr_samples == 0).
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
| |
This fixes GL45-CTS.sample_variables.mask.*.samples_1.*.
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
lp_build_any_true_range is just what we need, though it will only produce
optimal code with sse41 (ptest + set) - but even without it on 64bit x86
the code is still better (1 unpack, 2 movq + or + set), on 32bit x86 it's
going to be roughly the same as before.
While here also make it a "real" 8bit boolean - cuts one instruction but
more importantly similar to ordinary booleans.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of doing all the math with scalars, use vectors. This means the
overflow math needs to be done manually, albeit that's only really
problematic for the stride/index mul, the rest has been pretty much
moved outside the shader loop (albeit the mul could actually be optimized
away too), where things are still scalar.
To eliminate control flow in the main shader loop fetch, provide fake
buffers (so index 0 is always valid to fetch).
Still uses aos fetch though in the end - mostly because some more code
would be needed to handle unaligned fetches in that path, and because for
most formats it won't make a difference anyway (we generate some truly
horrendous code for things like R16G16_something for instance).
Instanced fetch however stays roughly the same as before, except that
no longer the same element is fetched multiple times (I've seen a reduction
of ~3 times in main shader loop size due to llvm not recognizing it's all
the same fetch, since it would have been possible some of the fetches
getting replaced with zeros in case vector size exceeds remaining fetch
count - the values of such fetches don't matter at all though).
Also, for elts gathering, use vectorized code as well.
The generated shaders are smaller and faster to compile (not entirely sure
about execution speed, but generally unless there's just single vertices
to handle I would expect it to be faster - there's more opportunities
for future improvements by using soa fetch).
v3: skip the fake index buffer, not needed due to the jit code never seeing
the real index buffer in the first place.
Fix a bug with mask expansion (needs SExt, not ZExt).
Also, be really really careful to keep the behavior the same, even in cases
where it looks wrong, and add comments why the code is doing the seemingly
wrong stuff... Fortunately it's not actually more complex in the end...
Also change function order slightly just to make the diff more readable.
No piglit change. Passes some internal testing with another api too...
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
| |
Multithreaded fragment shaders let us hide texturing latency by a
hyperthreading-style switch to another fragment shader. This gets us up
to 20% framerate improvements on glmark2 tests.
|
|
|
|
|
| |
The kernel changes for exposing the param have now been merged, so we can
expose it here.
|
|
|
|
| |
The value is 0 since we didn't set it, we wanted to see the param.
|
|
|
|
|
|
| |
CID 1394322
Signed-off-by: Mun Gwan-gyeong <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes fbo-mrt-alphatest
v2: styling fixes
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Acked-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
| |
I'm also sending out a piglit test, gl-2.0/vertexattribpointer-size-3,
which exposes this corner case.
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
| |
Fixes parts of GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo.
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Even though glBlitFramebuffer cannot be used for SINT <-> UINT blits, we
still need to handle this type of blit here because it can happen as part
of texture uploads / downloads, e.g. uploading a GL_RGBA8I texture from
GL_UNSIGNED_INT data.
Fixes parts of GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels.
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
There is no functionality in swr to clamp either vertex or frag colors.
This could be added in swr_shader, at which point these could be
re-enabled.
Fixes arb_color_buffer_float-render
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
| |
Fixes gl-3.2-basevertex-vertexid
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
| |
Fixes glsl-arb-fragment-coord-conventions.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rendering could still be ongoing (or have yet to start) when the shader
is deleted. There's no refcounting on the shader text, so insert a
pipeline stall unconditionally when this happens.
[Note, we should instead introduce a way to attach work to
fences, so that the freeing can be done in the current fence.]
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The EXT_texture_integer test says that blending and alphatest should
all be disabled. st/mesa takes care of alphatest already.
Fixes the ext_texture_integer-fbo-blending piglit test.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The support in swr requires shaders to output the components as UINTs.
This is not how GL or Gallium work, and since this is not a
required-renderable format, just leave it out.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
| |
Fixes the texsubimage piglit and lets the copyteximage one get further.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
| |
In a BGR10X2 or BGR5X1 situation, there's no need to try to quantize the
X channel - the default will have the proper quantization required.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
| |
Noticed by inspection.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
| |
This is the format used for the primary surface of a
PIPE_FORMAT_Z32_FLOAT_S8X24_UINT resource.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Piglit regressions (radeonsi or LLVM bugs, they pass on softpipe):
- glsl-1.10/execution/variable-indexing/vs-output-array-vec3-index-wr
- glsl-1.10/execution/variable-indexing/vs-output-array-vec4-index-wr
- glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-col-row-wr
- glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-row-wr
Totals:
SGPRS: 1132185 -> 1168801 (3.23 %)
VGPRS: 907856 -> 906204 (-0.18 %)
Spilled SGPRs: 2011 -> 2425 (20.59 %)
Spilled VGPRs: 368 -> 96 (-73.91 %)
Scratch VGPRs: 1344 -> 1060 (-21.13 %) dwords per thread
Code Size: 35916164 -> 35705372 (-0.59 %) bytes
LDS: 767 -> 767 (0.00 %) blocks
Max Waves: 194010 -> 194921 (0.47 %)
Wait states: 0 -> 0 (0.00 %)
Before:
VGPR SPILLING APPS Shaders SpillVGPR ScratchVGPR
alien_isolation 2938 38 40
bioshock-infinite 1769 245 732
dirt-showdown 548 85 72
f1-2015 776 0 320
ue4_lightroom_inter.. 74 0 180
After:
VGPR SPILLING APPS Shaders SpillVGPR ScratchVGPR
alien_isolation 2938 38 40
bioshock-infinite 1769 0 480
dirt-showdown 548 58 40
f1-2015 776 0 320
ue4_lightroom_inter.. 74 0 180
Bioshock and DiRT benefit.
If I set IF_THRESHOLD=4, tesseract starts spilling VGPRs
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Move to pass by value since most events are very small in size.
We can look at pass by reference but will need to create multiple
versions to handle temp objects.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
| |
Added events for tracking early/late Depth and stencil events,
TE patch info, GS prim info, and FrontEnd/BackEnd DrawEnd events.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
| |
- Do proper culling of wireframe triangles (including non-culling of
degenerates)
- Fix degenerate culling of CCW front-facing triangles in wireframe and
conservative rast
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
| |
Alpha from render target 0 should always be used for alpha test for all
render targets, according to GL and DX9 specs. Previously we were using
alpha from the current render target.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
| |
Don't generate files when no events have been generated outside
the header events.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
| |
Buffer events ourselves and then when that's full or we're destroying
the context then write the contents to file. Previously, we're relying
ofstream to buffer for us.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|