| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
The code was exactly the same, except util/ has c++ guards and a struct
simple_node declaration.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2:
- Only emit write SPI_TMPRING_SIZE once per packet.
- Use context global scratch buffer.
v3:
- Patch shaders using WRITE_DATA packet instead of map/unmap.
- Emit ICACHE_FLUSH, CS_PARTIAL_FLUSH, PS_PARTIAL_FLUSH, and
VS_PARTIAL_FLUSH when patching shaders.
v4:
- Code cleanups.
- Remove unnecessary multiplies.
v5:
- Patch shaders in system memory and re-upload to vram.
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This moves scratch buffer allocation from si_launch_grid() to
si_create_compute_state(). This helps to reduce the overhead of
launching a kernel and also fixes a bug in the code that would cause
the scratch buffer to be too small if a kernel with smaller scratch size
was launched before a kernel with a larger scratch size.
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GLSL 1.50 specifies a fragment shader may have a primitive id
input without a geometry shader present.
On r600 hw there is a special GS scenario for this, you have
to enable GS_SCENARIO_A and pass the primitive id through
the vertex shader which operates in GS_A mode.
This is a first pass attempt at this, and passes the piglit
tests that test for this.
v1.1: clean up debug print + no need to assign
key value to setup output.
v2: add r600 support
Reviewed-by: Glenn Kennard <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
In order to detect that a pixel shader has a prim id
input when we have no geometry shader we need to reorder
the shader selection so the pixel shader is selected
first, then the vertex shader key can take into account
the primitive id input requirement and lack of geom shader.
Reviewed-by: Glenn Kennard <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes writing beyond the allocated buffer:
==31855== Invalid write of size 1
==31855== at 0x50AB2A9: vsprintf (iovsprintf.c:43)
==31855== by 0x508F6F6: sprintf (sprintf.c:32)
==31855== by 0xB59C7EC: r600_get_compute_param (r600_pipe_common.c:526)
==31855== by 0x5B2B7DE: get_compute_param<char> (device.cpp:37)
==31855== by 0x5B2B7DE: clover::device::ir_target() const (device.cpp:201)
==31855== by 0x5B398E0: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63)
==31855== by 0x5B20152: clBuildProgram (program.cpp:182)
==31855== by 0x400F41: main (hello_world.c:109)
==31855== Address 0x56fed5f is 0 bytes after a block of size 15 alloc'd
==31855== at 0x4C29180: operator new(unsigned long) (vg_replace_malloc.c:324)
==31855== by 0x5B2B7C2: allocate (new_allocator.h:104)
==31855== by 0x5B2B7C2: allocate (alloc_traits.h:357)
==31855== by 0x5B2B7C2: _M_allocate (stl_vector.h:170)
==31855== by 0x5B2B7C2: _M_create_storage (stl_vector.h:185)
==31855== by 0x5B2B7C2: _Vector_base (stl_vector.h:136)
==31855== by 0x5B2B7C2: vector (stl_vector.h:278)
==31855== by 0x5B2B7C2: get_compute_param<char> (device.cpp:35)
==31855== by 0x5B2B7C2: clover::device::ir_target() const (device.cpp:201)
==31855== by 0x5B398E0: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63)
==31855== by 0x5B20152: clBuildProgram (program.cpp:182)
==31855== by 0x400F41: main (hello_world.c:109)
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Tom Stellard <[email protected]>
|
| |
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
| |
This was inadvertently disabled by
761e36b4caab4e8e09a4c2b1409a825902fc7d2c.
|
|
|
|
|
|
|
|
|
| |
Instead of passing a pointer to the scratch buffer via user sgprs, we
now patch the shader with the buffer address using reloc information
from the LLVM generated ELF.
v2:
- Make sure not to break older LLVM.
|
|
|
|
|
|
|
|
|
| |
v2:
- Use strdup for copying reloc names.
- Free reloc memory.
v3:
- Add free_relocs parameter to radeon_shader_binary_free_members()
|
| |
|
| |
|
|
|
|
| |
I wanted to read it, so I wrote parsing.
|
|
|
|
|
| |
Execution will end at the cl->next, because that's what ct0ea/ct1ea get
programmed to.
|
|
|
|
|
| |
Everything from ETC1 to RGBA64 was getting its top bit dropped, but we
didn't use any of those formats.
|
|
|
|
|
| |
Theoretically it should apply after dithering as well, but ditehring for
565 happens in fixed function in the TLB store.
|
|
|
|
|
| |
Since unpack only happens on things read from the A register file, we have
to leave them as something that can be allocated to A (temp or uniform).
|
|
|
|
| |
I want it from another location.
|
|
|
|
|
| |
It would mean different unpacking behavior, since only the A file does
unpack (with PM==0).
|
|
|
|
|
| |
No difference on shader-db, but prevents definite regressions in the
blending changes.
|
|
|
|
|
| |
No difference on shader-db, but will become more important as I introduce
more use of pack flags with the blending changes.
|
|
|
|
|
|
| |
It turns out the simulator was not treating this bit the same as the RPi,
and I'd forgotten to remove it when turning on early Z. The result was
that you'd get big chunks of your rendering missing.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 0543630d0b0d9d9f6eefbc14fbd3385d4de37ba0.
It caused flickering artifacts in Steam games such as Team Fortress 2 or
Left 4 Dead 2.
We could probably only enable this optimization by also making sure the
shader code only uses either SI_PARAM_LINEAR_CENTROID or
SI_PARAM_LINEAR_CENTER, not both. This would probably require a shader
variant.
Sorry I didn't remember this when reviewing the reverted change.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
If, for example, only the x/y/w components of in.xyzw are actually used,
we still need to have a group of four registers and assign all four
components. The hardware can't write in.xy and in.w to discontiguous
registers. To handle this, pad with a dummy NOP instruction, to keep
the neighbor chain contiguous.
This fixes a problem noticed with firefox OMTC.
Signed-off-by: Rob Clark <[email protected]>
|
| |
|
|
|
|
| |
Fixes the remaining ARB_color_buffer_float rendering tests.
|
| |
|
|
|
|
|
|
|
|
| |
No need to recheck the FS compile when the VS source has changed, but
there *is* a need to recheck the VS compile when the compiled VS has
changed (since the live inputs may change).
Fixes es3conform's blend test.
|
|
|
|
|
|
|
| |
The util_pack_color() thing only sets up the low bits of the union, so
only return them, too. Fixes intermittent failure on
fbo-alphatest-formats and es3conform's framebuffer-objects test under
simulation.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Turns out this was harmful in code quality:
total instructions in shared programs: 39487 -> 38845 (-1.63%)
instructions in affected programs: 22522 -> 21880 (-2.85%)
This costs us yet another register, which is painful since it means more
programs might fail to compile). However, the alternative was causing us
trouble where we'd save/restore r3 while it contained a MIN-ed direct
texture offset, causing the kernel to fail to validate our shaders (such
as in GLB2.7).
|
|
|
|
|
|
|
|
| |
This gets a bunch of dead reads out of the CSes, which don't read most
attributes generally.
total instructions in shared programs: 39753 -> 39487 (-0.67%)
instructions in affected programs: 4721 -> 4455 (-5.63%)
|
|
|
|
|
|
| |
This will give the compiler the chance to dead-code eliminate unused VPM
reads. This is particularly a big deal in the CS where a bunch of vattrs
are just not going to be used.
|
|
|
|
|
| |
Some ops can't be DCEd, while some of the ops that are just important due
to the args they have can be.
|
|
|
|
| |
This will let us do copy propagation of the VPM reads.
|
|
|
|
|
| |
We pass in a byte offset, not dword. I'm rather scared that this actually
managed to pass piglit, but it does fix gears.
|
|
|
|
|
| |
total instructions in shared programs: 40960 -> 39753 (-2.95%)
instructions in affected programs: 20871 -> 19664 (-5.78%)
|
|
|
|
|
|
|
|
|
| |
I'm using this in some WIP commits for doing blending in 8888 instead of
vec4. But it also gives us these results immediately, thanks to allowing
more uniforms/immediates in the arguments:
total instructions in shared programs: 41027 -> 40960 (-0.16%)
instructions in affected programs: 4381 -> 4314 (-1.53%)
|
|
|
|
|
| |
Our MOV's dst obviously won't be the TLB_COLOR_READ's def, because we're
ssa.
|
|
|
|
| |
Any other caller would want it, too.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can't (or don't know how to) turn this off. But it can end up being
stored to a higher reg # than what the shader uses, leading to
corruption.
Also we currently aren't clever enough to turn off frag_coord/frag_face
if the input is dead-code, so just fixup max_reg/max_half_reg. Re-org
this a bit so both vp and fp reg footprint fixup are called by a common
fxn used also by ir3_cmdline. Also add a few more output lines for
ir3_cmdline to make it easier to see what is going on.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Handle TEMP[ADDR[]] src registers by generating a fanin to group array
elements, similarly to how texture fetch instructions work.
NOTE:
For all the scalar instructions generated for a single tgsi vector
operation which uses an array src (or possibly even uses the same array
as multiple srcs), re-use the same fanin node. Since a vector operation
operates on all components at the same time, it should never see more
than one version of the same array.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
To use fanin's to group registers in an array, we can potentially have a
much larger array of registers. Rather than continuing to bump up the
array size, just make it dynamically allocated when the instruction is
created.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Group inputs/outputs, in addition to fanin/fanout, as they must also
exist in sequential scalar registers. This lets us simplify RA by
working in terms of neighbor groups.
NOTE: has the slight problem that it can't optimize out mov's for things
like:
MOV OUT[n], IN[m]
To avoid this, instead of trying to figure out what mov's we can
eliminate, we first remove all mov's prior to grouping, and then
re-insert mov's as needed while grouping inputs/outputs/fanins.
Eventually we'd prefer the frontend to not insert extra mov's in the
first place (so we don't have to bother removing them). This is the
plan for an eventual NIR based frontend, so separate out the instr
grouping (which will still be needed for NIR frontend) from the mov
elimination (which won't).
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
For temp arrays, a 32bit mask won't be sufficient.. but otoh we don't
need to support an arbitrary mask. So for this case use a simple size
field rather than a bitmask.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Slight bit of refactoring that will be needed for indirect gpr
addressing (TEMP[ADDR[]]).
Signed-off-by: Rob Clark <[email protected]>
|