| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
For some reason llvm5 is picky about accepting a void * type in the
case of building an argument list.
Since we don't care about the type (we ignore the argument for now),
pick another pointer type
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Early Rasterization is an optimization for small triangles.
Scientific workloads often contain very small triangles that has non-zero
area and cannot be trivially rejected as falling between pixel centers,
but does not cover any pixel center. Those triangles can be initially
rasterized as early as in binner and rejected if they cover no pixels The
optimization can be disabled in compilation using KNOB_ENABLE_EARLY_RAST
option in knobs.h
The Early Rast is disabled by default.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Flip the switch(es) to enable simd16 vertex shaders:
USE_SIMD16_SHADERS and USE_SIMD16_VS
Both have to be enabled at the same time. Currently, just setting
USE_SIMD16_SHADERS does not work correctly.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Supporting simd16 vertex shaders involves packing the output of the
fetch shader appropriately, especially the vertexID buffers that have to
be formatted in one simd16 register, needed by the VS.
As part of this support, we needed to remove the 2nd JitManager, since it
was not accounting for vector width correctly.
USE_SIMD16_SHADERS is also split into two defines. The additional
one (USE_SIMD16_VS) controls the width of the vertex shader (VS), while
the original one (USE_SIMD16_SHADERS) controls overall front end width.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
Properly validate DLL matches OBJ for jitted function
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
Create shader_lib during build, link with shaders at DLL generation time
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new define (USE_SIMD16_VS), to denote calling a 16-wide vertex shader.
This is needed because the mesa driver can do 16-wide shaders, but rasty
cannot yet, so we need to distinguish.
Create a new VertexID entry (VertexID16) for the USE_SIMD16_VS case, since
we need to format the vertex id in a way that is digestible by the 16-wide VS
Disabled for now. To be enabled in a future checkin when driver work
is complete.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
| |
Add name argument to x86 autogenerated macros.
Add useful variable names for DCL_inputVec implementation.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
in shader and fetch dump files
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
| |
- Move debug .ll files to JIT_CACHE_DIR
- Don't link against jitter SRGBLut table, add global data to shader that needs it.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
| |
Added support for Fetch / Sample / LD functions
Added DLL link to JitCache implementation
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Adds ability to step into jitted llvm IR in Visual Studio.
- Updated llvm type generation script to also generate corresponding debug types.
- New module pass inserts debug metadata into the IR for each function
Disabled by default.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
+ ZeroMemory() macro definition for non win32-compilation in common/os.h
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Part 2 of 2 (part 1 is autoconf changes, part 2 is C++ changes)
When only a single SWR architecture is being used, this allows that
architecture to be builtin rather than as a separate libswrARCH.so that
gets loaded via dlopen. Since there are now several different code
paths for each detected CPU architecture, the log output is also
adjusted to convey where the backend is getting loaded from.
This allows SWR to be used for static mesa builds which are still
important for large HPC environments where shared libraries can impose
unacceptable application startup times as hundreds of thousands of copies
of the libs are loaded from a shared parallel filesystem.
Based on an initial implementation by Tim Rowley.
v2: Refactor repetitive preprocessor checks to reduce code duplication
v3: Formatting changes per Bruce C. Also delay screen creation until end
to avoid leaks when failure conditions are hit.
Signed-off-by: Chuck Atkins <[email protected]>
Reviewed-by: Bruce Cherniak <[email protected]>
CC: Tim Rowley <[email protected]>
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Part 1 of 2 (part 1 is autoconf changes, part 2 is C++ changes)
When only a single SWR architecture is being used, this allows that
architecture to be builtin rather than as a separate libswrARCH.so that
gets loaded via dlopen. Since there are now several different code
paths for each detected CPU architecture, the log output is also
adjusted to convey where the backend is getting loaded from.
This allows SWR to be used for static mesa builds which are still
important for large HPC environments where shared libraries can impose
unacceptable application startup times as hundreds of thousands of copies
of the libs are loaded from a shared parallel filesystem.
Based on an initial implementation by Tim Rowley.
v2: Fix comment placement pointed out by Bruce C.
Signed-off-by: Chuck Atkins <[email protected]>
Reviewed-by: Bruce Cherniak <[email protected]>
CC: Tim Rowley <[email protected]>
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
According to the ARB_multisample num_samples is a non-negative integer.
Consequently define it as such, fail in glx/choose_visual if a negative
number is given.
v2: split patch into gallium and mesa part
Signed-off-by: Gert Wollny <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Grazvydas Ignotas <[email protected]>
Reviewed-by: Christian König <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
lp_build_interleave2_half was not doing the right thing for avx512-style
16-wide loads.
This path is hit in the swr driver with a 16-wide vertex shader. It is
called from lp_build_transpose_aos, when doing texel fetches and the
fetched data needs to be transposed to one component per output register.
Special-case the post-load swizzle operations for avx512 16x32 (16-wide
32-bit values) so that we move the xyzw components correctly to the outputs.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
| |
Currently a couple of gallium targets race with xmlpool_options.h being
generated, don't do that.
Signed-off-by: Dylan Baker <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Only one piglit test fails,
sso-vs-gs-fs-array-interleave
There are 3 tests using ssbo without checking sizes failing also
but those are test bugs.
Reviewed-by: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
| |
if no destination:
a) convert _RET instructions to non _RET variants if no dst
b) set src0 to undefined if it's a READ, this should get DCE then.
Acked-By: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The normal ssa renumbering isn't sufficient for LDS queue access,
this uses two stacks, one for the lds queue, and one for the
lds r/w ordering.
The LDS oq values are incremented in their use in a linear
fashion.
The LDS rw values are incremented in their definitions and used
in the next lds operation to ensure reordering doesn't occur.
Acked-By: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
So LDS ops have to be SLOT_X,
and LDS OQ reads have read port restrictions so we try
and force those into only having one per slot and avoiding
bank swizzles.
Acked-By: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This tries to avoid an lds queue read getting scheduled separately
from an lds ret read, the non-sb code uses the same style of hammer,
this isn't foolproof.
We can do better, but it's a bit tricky, as you have to scan ahead
and either schedule more lds oq moves and more lds reads and that
could lead to you running out of space anyways.
Acked-By: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This adds support for tracking the lds oq read/writes
so can avoid scheduling other things in between.
This patch just adds the tracking and assert to show
problems.
Acked-By: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
You have to schedule LDS_READ_RET _, x and MOV reg, LDS_OQ_A_POP
in the same basic block/clause. This makes sure once we've issues
and MOV we don't add another block until we balance it with an
LDS read.
Acked-By: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
This adds lds to the geom emit handling
Acked-By: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
Don't try and fold LDS using expressions.
Acked-By: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
We need to convert these to the hw special registers.
Acked-By: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
This handles parsing the LDS ops and queue accessess.
Acked-By: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
This fixes bad interactions with the LDS special values.
Acked-By: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
Although these are op3s they don't have a dst reg.
Acked-By: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For LDS read/write ordering we use the LDS_RW value, reads
will wait on previous writes.
For LDS read/read from LDS queue ordering we use the LDS_OQ
values, we define two for now, though initially we'll just
support OQA.
Also add the check for the lds oq values
Acked-By: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
It's rare to have a final alu clause on normal shaders (exports)
but tess shaders write to LDS as their output, so we see some
alu clauses, and the CF_END get put in the wrong place.
This makes sure to update last_cf correctly.
Acked-By: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This adds support for GDS ops to sb backend.
This seems to work for atomics and tess factor writes.
Acked-By: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
This stops them being optimised out.
Reviewed-by: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some tess shaders were doing MOVA_INT _, c0.x on cayman, and then
hitting an assert in sb_bc_finalize.cpp:translate_kcache.
This makes sure the toplevel kcache tracker gets updated,
and the clause gets fixed up.
Reviewed-by: Roland Scheidegger <[email protected]>
Cc: <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
Just saves a pointless a = a + 0;
Reviewed-by: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
This field is ignored for tf writes so should be 0.
Reviewed-by: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Gert Wollny <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|