| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
Move a few things around to group stuff that is common to a3xx/a4xx
together. Also, introduce is_ir3() for things that are more specific to
the compiler / shader-ISA than to the gpu generation.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
MESA_LLVM_VERSION_PATCH is undefined.
Reviewed-by: Edward O'Callaghan <eocallaghan at alterapraxis.com>
Tested-by: Benjamin Bellec <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
r600 currently has 73 atoms and looping through their dirty flags has
become costly because checking each flag requires a pointer
dereference before the read. To avoid having to do that add additional
bitfield which can be checked really quickly thanks to tzcnt instruction.
id field was added to struct r600_atom but that doesn't affect memory
usage for both 32 and 64 bit CPUs because it was stuffed into padding.
The performance improvement is ~2% for benchmarks that can have FPS in
the thousands but is hardly measurable in "real" programs.
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
On evergreen config_state is not used, so don't mark it dirty.
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Instead of writing to rctx->atoms directly use a helper to take
advantage of assert checks.
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This is analogous to r300_mark_atom_dirty() used by r300, and will
be used by later patches. For common radeon code, appropriate helper
is called through a function pointer.
No functional changes.
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
v2: just clear the flag before the allocation
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
For gmem restore (mem2gmem), we swap blit programs, in order to have a
different frag shader for depth vs color restore. But we weren't
actually clearing the cached fp, so it would not actually change the
frag shader as expected.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
For gmem restore (mem2gmem), we swap blit programs, in order to have a
different frag shader for depth vs color restore. But we weren't
actually clearing the cached fp, so it would not actually change the
frag shader as expected.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Fixes build with MinGW x86_64 build with GCC 4.9, due to conflicting
definition _mm_shuffle_epi8 of u_sse.h and system headers.
Trivial.
|
|
|
|
|
|
| |
Cc: 10.6 <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As sugested by Tom a long time ago
and in order to be able to create Piglit tests
v2:
replace NOT_SUPPORTED_BY_CL_1_1 macro with an inline function
remove extra space in clLinkProgram arg
v3:
use __func__
v4:
back to a macro, it make more sense to use it with __func__
[ Francisco Jerez: Rename to CLOVER_NOT_SUPPORTED_UNTIL and pass the
minimum API version required by the entry point so the error
messages don't become stale when support for additional CL versions
is introduced. ]
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Same idea as in libdrm_amdgpu.
A command stream can only be created for a specific context and it's always
submitted to that context.
This will mainly be used by amdgpu and it's required by the GPU reset status
query too.
(radeon only has a basic version of the query and thus doesn't need this)
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
| |
The timeout parameter covers both cases.
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
| |
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
| |
better safe than sorry
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
| |
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
| |
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
| |
This display the units in the HUD.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The cumulative value is useful for queries like the number of shader
compilations.
Reviewed-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Also use only one store if stride <= 4.
All the fetches from and stores to temporaries can be removed now.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91461
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
| |
Acked-by: Michel Dänzer <[email protected]>
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
| |
Luckily, there is a kernel query, so use the size from that.
It currently returns 256KB. It can be increased in the kernel.
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
| |
Picked from the amdgpu branch.
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
| |
When we are measuring the time spent in a draw call, an unexpected flush
can distort the result.
Reviewed-by: Michel Dänzer <[email protected]>
|
| |
|
|
|
|
| |
We can just support them the same way we do load_const's SSA values.
|
|
|
|
|
|
|
|
|
| |
Since we just pulled it out of the destination as 8-bit unorm, we know
it's in [0, 1] already.
shader-db:
total instructions in shared programs: 100040 -> 98208 (-1.83%)
instructions in affected programs: 14084 -> 12252 (-13.01%)
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, SFU values always moved to a temporary, and TLB color reads
and texture reads always lived in r4. Instead, we can have these results
just be normal temporaries, and the register allocator can leave the
values in r4 when they don't interfere with anything else using r4.
shader-db results:
total instructions in shared programs: 100809 -> 100040 (-0.76%)
instructions in affected programs: 42383 -> 41614 (-1.81%)
|
| |
|
|
|
|
|
|
| |
needed for MRT
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Both a3xx and a4xx need the same logic to decide if half-precision can
be used for blit shaders. So move it to core and simplify things a bit
with a helper that considers all render targets.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Collapse dirty/reading bools into status bitmask (and drop writing which
should really be the same as dirty). And use 'used_resources' list for
all tracking, including zsbuf/cbufs, rather than special casing the
color and depth/stencil buffers.
Signed-off-by: Rob Clark <[email protected]>
|