| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were storing arrays in vectors, which was leading to some really bad
spill code for large arrays. allocas instructions are a better fit for
arrays and LLVM optimizations are more geared toward dealing with
allocas instead of vectors.
For arrays that have 16 or less 32-bit elements, we will continue to use
vectors, because this will force LLVM to store them in registers and
use indirect registers, which is usually faster for small arrays.
In the future we should use allocas for all arrays and teach LLVM
how to store allocas in registers.
This fixes the piglit test:
spec/glsl-1.50/execution/geometry/max-input-component
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
bld->indirect_files is never set, so this function always returns false.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Want to re-use this struct, so un-inline it.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
Be more consistent with the other u_inlines util_copy_xyz_state()
helpers and support NULL src.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
If a depth/stencil texture has no mipmaps, we can always get a layout that is
compatible with DB and TC.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a rare bug with stencil texturing -- seen on Polaris and Tonga,
though it's basically a function of the memory configuration so could affect
other parts as well.
Fixes piglit "unaligned-blit * stencil downsample" and various
"fbo-depth-array *stencil*" tests.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
v2: s/dirty_level_mask/stencil_dirty_level_mask/ in stencil case
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Also clean up some of the looping.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Note that this has no effect yet. A case where can_sample_z/s can be false
in radeonsi will be added in a later patch.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This is a left-over of when I considered generalizing the separate stencil
support. I do prefer the new name since it emphasizes what flushing vs.
non-flushing means from a functional point-of-view, namely special handling
of the texture format.
v2: adjust r600_init_color_surface as well
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
v2: adjust r600_init_color_surface as well
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Account for the fact that max_layer is minified for higher levels.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
v2: keep using r600_texture_reference
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
It is the same for all levels.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Tested with Lightsmark2008, MTT piglit, glretrace, conform.
Reviewed-by: Sinclair Yeh <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Leo Liu <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
| |
Height should be aligned with 2 macroblocks, thus making safer
for tiled mode
Signed-off-by: Leo Liu <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise things will fail to build, if the builder is using another
version of LLVM.
v2: annotate all the dependencies of builder_gen.h
v3: clean the generated files as needed
v4: comment cleanups (Tim)
Cc: "12.0" <[email protected]>
Tested-by: Tim Rowley <[email protected]>
Tested-by: Chuck Atkins <[email protected]> (v2)
Reported-by: Chuck Atkins <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Considering how hard/annoying it was for many peoples' workflow to
properly generate the macro, it will be demoted to conditionally
available with follow-up commits.
v2: Kill off gracious blank line (Vedran).
Cc: [email protected]
Cc: Vedran Miletić <[email protected]>
Cc: Francisco Jerez <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]> (v1)
Reviewed-by: Vedran Miletić <[email protected]>
|
|
|
|
|
|
|
|
| |
While we are at it, fix a typo inside the comment which describes
what those constants are for.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
In presence of an indirect image access, the base offset should be
zeroed because the stride will be computed twice. This is a pretty
rare situation but it can happen when tex.r > 0.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Cc: "11.2 12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When emitting OP_SUB, the sign bit for FADD and FADD32I is not
at the same position. It's at position 45 for FADD but 51 for FADD32I.
This fixes the following piglit test:
tests/spec/arb_fragment_program/fdo30337b.shader_test
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Cc: <[email protected]>
|
|
|
|
|
|
| |
ALU0 didn't have the _dest variant, and ALU2 didn't unset the def the way
ALU1 did. This should make the ALU[012] macros much clearer, by moving
most of their contents to vc4_qir.c
|
|
|
|
|
|
| |
Now that we're about to start generating control flow in our NIR, we want
this in place. It optimizes things frequently in the CS, when the GL VS
has control flow that doesn't affect the vertex position.
|
|
|
|
|
|
|
|
|
|
|
| |
Tiny change on shader-db currently, but it will be important when we start
emitting a lot of SFs from the same variable as part of control flow
support.
total instructions in shared programs: 89463 -> 89430 (-0.04%)
instructions in affected programs: 1522 -> 1489 (-2.17%)
total estimated cycles in shared programs: 250060 -> 250015 (-0.02%)
estimated cycles in affected programs: 8568 -> 8523 (-0.53%)
|
|
|
|
|
|
|
|
|
| |
The DCE pass is going to change significantly to handle control flow,
while we don't really need to change it for the SF handling. We also need
to add some more SF peephole optimization for SF updates generated by
control flow support.
No change on shader-db.
|
|
|
|
|
|
|
|
| |
I'm going to add an optimization for redundant SF update removal, which
will just remove the SF and leave us (in many cases) with an instruction
with a NULL destination and no side effects. Rather than teaching that
pass whether the whole instruction can be removed, leave that
responsibility to this pass.
|
|
|
|
|
|
|
| |
We need to not DCE them even though they don't have a destination in QIR.
We also shouldn't relocate them in vc4_opt_vpm. Neither of these things
happen, but I'm about to make DCE consider instructions with a NULL
destination.
|
|
|
|
|
|
|
| |
Noticed by code inspection. This hasn't been too big of a deal, because
our cond usages all start out as adder ops, either MOVs or the FTOI for Z
writes. MOVs *can* get converted to mul ops during scheduling, but
apparently we hadn't hit this.
|
|
|
|
|
| |
This isn't used since we switched to using the dst.pack field instead of
custom instructions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Main shader parts and geometry shaders are compiled asynchronously
by util_queue. si_create_shader_selector doesn't wait and returns.
si_draw_vbo(si_shader_select) waits for completion.
This has the best effect when shaders are compiled at app-loading time.
It doesn't help much for shaders compiled on demand, even though
VS+PS compilation should take as much as time as the bigger one of the two.
If an app creates more shaders, at most 4 threads will be used to compile
them.
Debug output disables this for shader stats to be printed in the correct
order.
(We could go even further and build variants asynchronously too, then emit
draw calls without waiting and emit incomplete shader states, then force IB
chaining to give the compiler more time, then sync the compilation at the IB
flush and patch the IB with correct shader states. This is great for
compilation before draw calls, but there are some difficulties such as
scratch and tess states requiring the compiler output, and an on-disk shader
cache will likely be a much better and simpler solution.)
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
to allow multiple shaders to be compiled simultaneously.
ALso, shader-db can again use all 4 cores.
v2: Remove the pipe_mutex_unlock call in the error path.
Reviewed-by: Nicolai Hähnle <[email protected]> (v1)
|
|
|
|
|
|
|
| |
The function interface is ready to be used by util_queue.
Also, si_shader_select_with_key can no longer accept si_context.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
v2: - squashed the patches
- use INT_MAX
- clamp max_const_buffer_size
- check the DRM version in radeon
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Vedran Miletić <[email protected]>
|
|
|
|
|
|
|
| |
Getting LLVM IRs of hanging shaders have never been easier.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
and remove some obsolete comments
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This will be needed after some LLVM changes that haven't landed yet.
v2: - use LLVMIsConstant to fix an LLVM assertion failure.
LLVMSetMetadata doesn't work with constants.
- don't set float metadata as string
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
It's not true.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
use v_interp_mov for those
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
Handle the bc_optimize SGPR bit if both CENTER and CENTROID are enabled.
This should increase the PS launch rate for big primitives with MSAA.
Based on discussion with SPI guys.
Reviewed-by: Nicolai Hähnle <[email protected]>
|