| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Main shader parts and geometry shaders are compiled asynchronously
by util_queue. si_create_shader_selector doesn't wait and returns.
si_draw_vbo(si_shader_select) waits for completion.
This has the best effect when shaders are compiled at app-loading time.
It doesn't help much for shaders compiled on demand, even though
VS+PS compilation should take as much as time as the bigger one of the two.
If an app creates more shaders, at most 4 threads will be used to compile
them.
Debug output disables this for shader stats to be printed in the correct
order.
(We could go even further and build variants asynchronously too, then emit
draw calls without waiting and emit incomplete shader states, then force IB
chaining to give the compiler more time, then sync the compilation at the IB
flush and patch the IB with correct shader states. This is great for
compilation before draw calls, but there are some difficulties such as
scratch and tess states requiring the compiler output, and an on-disk shader
cache will likely be a much better and simpler solution.)
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
to allow multiple shaders to be compiled simultaneously.
ALso, shader-db can again use all 4 cores.
v2: Remove the pipe_mutex_unlock call in the error path.
Reviewed-by: Nicolai Hähnle <[email protected]> (v1)
|
|
|
|
|
|
|
| |
The function interface is ready to be used by util_queue.
Also, si_shader_select_with_key can no longer accept si_context.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
v2: - squashed the patches
- use INT_MAX
- clamp max_const_buffer_size
- check the DRM version in radeon
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Vedran Miletić <[email protected]>
|
|
|
|
|
|
|
| |
Getting LLVM IRs of hanging shaders have never been easier.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
and remove some obsolete comments
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This will be needed after some LLVM changes that haven't landed yet.
v2: - use LLVMIsConstant to fix an LLVM assertion failure.
LLVMSetMetadata doesn't work with constants.
- don't set float metadata as string
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
It's not true.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
use v_interp_mov for those
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
Handle the bc_optimize SGPR bit if both CENTER and CENTROID are enabled.
This should increase the PS launch rate for big primitives with MSAA.
Based on discussion with SPI guys.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
This should increase the PS launch rate for shaders using at least 2 pairs
of perspective (i,j) and same for linear.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
This reduces the number of v_mov's in the prolog.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
It's always zero.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96782
Fixes: 54c4d525da7c7fc1e103d7a3e6db015abb132d5d ("r600g: Enable FMA on chips that support it")
Signed-off-by: Jan Vesely <[email protected]>
Tested-by: James Harvey <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Signed-off-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Devices with smaller GMEM size need more tiles. On db410c at 2048x1152,
glmark2 shadow needed ~330 tiles for fullscreen. Lets bump it up to
512. (Maybe with MRT you could end up needing more, but at that point
things are probably going to be painfully slow.)
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
For .vert/.frag, now multiple can be specified on the cmdline for
purposes of linking, and the last one specified is the one that is
fed into the ir3 backend (and dumped along the way if --verbose is
specified)
Without this, varyings in frag shaders would appear as undefined.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Rather than doing a separate submit at context create, move these cmds
to before first tile, as is done on a3xx/a4xx. Otherwise state can
be overwritten by other contexts.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
This will be useful in a following patch.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
We can push the unwrap of pipe_resource down.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
gcc6 does not like the trick where we point to one entry before the
array start and then start a while with a pre-increment.
Signed-off-by: Hans de Goede <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These are all new false positives with gcc6.
In nouveau_compiler.c: gcc6 no longer assumes that passing a pointer
to a variable into a function initialises that variable.
In nv50_ir_from_tgsi.cpp op and mode are not set if there are 0
enabled dst channels, this never happens, but gcc cannot know this.
Signed-off-by: Hans de Goede <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Hans de Goede <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
| |
Add support for SV_WORK_DIM for nvc0 and nve4.
Signed-off-by: Hans de Goede <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This brings it inline with the other macros like NVC0_CB_AUX_UBO_INFO
and NVC0_CB_AUX_TEX_INFO.
Signed-off-by: Hans de Goede <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Basically we just have to scale up the coordinates and then add the
relevant sample offset. The code to handle this was already largely
present from Christoph's earlier attempts to pipe images through back in
the dark ages, this just hooks it all up.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The output of draw requires a null viewport transform, which the regular
code is ill-equiped to do. Reinstate the original settings in the render
path, and add setting of the viewport clip polygon based on fb
width/height (as that is all taken care of by draw).
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
This fixes a ton of "*clip*" dEQP GLES2 tests, as well as
triangle-guardband-viewport in piglit.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Encapsulate the test for which flags are needed to get a compiler to
support certain features. Along with this, give various options to try
for AVX and AVX2 support. Ideally we want to use specific instruction
set feature flags, like -mavx2 for instance instead of -march=haswell,
but the flags required for certain compilers are different. This
allows, for AVX2 for instance, GCC to use -mavx2 -mfma -mbmi2 -mf16c
while the Intel compiler which doesn't support those flags can fall
back to using -march=core-avx2.
This addresses a bug where the Intel compiler will silently ignore the
AVX2 instruction feature flags and then potentially fail to build.
v2: Pass preprocessor-check argument as true-state instead of
false-state for clarity.
v3: Reduce AVX2 define test to just __AVX2__. Additional defines suchas
__FMA__, __BMI2__, and __F16C__ appear to be inconsistently defined
w.r.t thier availability.
v4: Fix C++11 flags being added globally and add more logic to
swr_require_cxx_feature_flags
Cc: <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
Tested-by: Tim Rowley <[email protected]>
Signed-off-by: Chuck Atkins <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
So that we do copies host-side rather than in the guest with map/memcpy.
Tested with piglit arb_copy_buffer-subdata-sync test and new
arb_copy_buffer-intra-buffer-copy test.
Reviewed-by: Charmaine Lee <[email protected]>
Acked-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
| |
Acked-by: Roland Scheidegger <[email protected]>
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With host-side buffer copies (via SVGA3D_vgpu10_BufferCopy()) we have
to make sure any pending map-write operations are completed before reading
if the buffer is dirty. Otherwise the ReadbackSubResource operation could
get stale data from the host buffer.
This allows the piglit arb_copy_buffer-subdata-sync test to pass when
we start using the SVGA3D_vgpu10_BufferCopy command.
v2: check the sbuf->dirty flag in the outer conditional, per Charmaine.
Acked-by: Roland Scheidegger <[email protected]>
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
Acked-by: Roland Scheidegger <[email protected]>
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We previously could do blits with util_resource_copy_region() when doing
'loose' format checking. Also do blits with util_resource_copy_region()
when the blit src/dst formats (not the underlying resources) exactly
match. Needed for GL_ARB_copy_image.
Acked-by: Roland Scheidegger <[email protected]>
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
| |
v2: remove extra svga_define_texture_level() call, per Charmaine.
Acked-by: Roland Scheidegger <[email protected]>
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Do texture->texture copies host-side with this command when possible.
Use the previous software fallback otherwise.
Reviewed-by: Brian Paul <[email protected]>
Acked-by: Roland Scheidegger <[email protected]>
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We don't normally support rendering to SNORM surfaces, but with
GL_ARB_copy_image we can copy to them if we treat them as typeless
and use a UNORM surface view.
Acked-by: Roland Scheidegger <[email protected]>
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
| |
Acked-by: Roland Scheidegger <[email protected]>
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
|
| |
We previously handled the case of a RGBX sampler view of a RGBA surface.
Add the reverse case too. For GL_ARB_copy_image.
Acked-by: Roland Scheidegger <[email protected]>
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
|
| |
For GL_ARB_copy_image we may be asked to create an RGBA view of
a RGBX surface. Use an RGBX view format for that case.
Acked-by: Roland Scheidegger <[email protected]>
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We want to be able to copy between different 32-bit, 3-channel surface
formats for GL_ARB_copy_image but since we don't support R32G32B32_FLOAT
for textures (it's not blendable and wouldn't work for render to texture)
we can't support 32-bit, 3-channel integer formats.
The state tracker will choose 4-channel formats instead.
Fixes the piglit arb_copy_image-format test for several cases.
Note: This change may need to be revisited if/when the texture_view exension
is enabled in driver.
Reviewed-by: Brian Paul <[email protected]>
Acked-by: Roland Scheidegger <[email protected]>
Reviewed-by: Charmaine Lee <[email protected]>
|