| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commutativity is not allowed with SHLADD, but src2 can accept
loads. To allow the load propagation pass to do its job, add a
special case like for SUCLAMP because src1 is always an immediate.
This IMAD to SHLADD optimization helps a bunch of shaders from Tomb
Raider, Victor Vran, UE4 demos (+15% perf with Elemental) and Shadow
Warrior.
GF100/GK104:
total instructions in shared programs :2838045 -> 2834712 (-0.12%)
total gprs used in shared programs :396684 -> 396386 (-0.08%)
total local used in shared programs :34416 -> 34416 (0.00%)
local gpr inst bytes
helped 0 326 1105 1105
hurt 0 55 3 3
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
Only and only if src1 is a power of 2 we can replace IMAD by SHLADD.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
Unfortunately, we can't use the emit helpers for GF100/GK110
because src1 and src2 are swapped.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This instruction is available since SM20 (Fermi) and allow to do
(a << b) + c in one shot. In some situations, IMAD should be
replaced by SHLADD when b is a power of 2, and ADD+SHL should be
replaced by SHLADD as well.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
envyas now uses a much better representation for those control
codes and it displays the different flags instead of an
unreadable hex number.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
Most of the time, even the 512 bytes that we now get is more than sufficient
(pipeline stats queries are the largest at 184 bytes per shot).
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
v2: enable only when compute is available
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
v2: fix a comment (Gustaw Smolarczyk)
Acked-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
To ensure that fences are properly initialized.
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We will support the waiting option in ARB_query_buffer_object using
WAIT_REG_MEM on an appropriate fence-like dword. Some queries conveniently
write their results with the highest bit set, and we can just use that;
for others, we have to write a fence explicitly.
ZPASS_DONE for occlusion queries writes its results with the high bit
set, but it writes up to 8 pairs of results (one for each DB). We have
to wait for all of these results, so let's just add an explicit fence.
The new function provides summary information to be used by subsequent
patches.
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
Save compute shader state that will be used for the ARB_query_buffer_object
implementation.
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
These functions extract the pipe state structure from the current
descriptors, for state saving.
v2: correctly dereference *buf (Bas)
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
For bottom-of-pipe fences inside the gfx command stream.
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
There are driver-specific context flags for barriers that are not covered
by the Gallium barrier interfaces.
The R600 settings of these flags may not be optimal, but we're not going
to use them yet anyway.
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
| |
|
|
|
|
|
|
|
| |
Fixes lots of piglit tests crashing due to using uninitialized memory.
Fixes: ecd6fce2611e ("mesa/st: support lowering multi-planar YUV")
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Replace old string comparison with a mapping table.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When passed to winsys->buffer_create, this flag will indicate that we require
a buffer that maps 1:1 with a kernel buffer handle.
This is currently set for all textures, since textures can potentially be
exported to other processes. This is not a huge loss, since the main purpose
of this patch series is to deal with applications that allocate many small
buffers.
A hypothetical application with tons of tiny textures might still benefit
from not setting this flag, but that's not a use case I'm worried about
just now.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This is really the behavior we want most of the time, but having a
SYNCHRONIZED flag instead of an UNSYNCHRONIZED one has the advantage that
OR'ing different flags together always results in stronger guarantees.
The parent BOs of sub-allocated buffers will be added unsynchronized.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This adds a new envvar called NV50_PROG_CHIPSET which allows to
compile shaders with a different target, especially useful for
shader-db.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
Based off of Ilia's original patch, but with output values replicated so
that it matches the TGSI semantics.
Signed-off-by: Glenn Kennard <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
When we create a depth/stencil texture, also check if we can render to
it and set the PIPE_BIND_DEPTH_STENCIL flag. We were previously doing
this for color textures (PIPE_BIND_RENDER_TARGET).
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
| |
This may have been needed years ago during development, but not now.
Prevents some regressions after introducing the next patch.
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
|
|
| |
To handle z/layer fix-ups for blitting and copying. Note that we weren't
doing this properly in svga_blit() before.
Also, remove redundant stex, dtex assignments.
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
|
| |
The util_is_format_compatible() function didn't quite do what we wanted
for vgpu10. This check is more flexible and allows copies between
formats such as R32G32B32A32_FLOAT and R32G32B32A32_INT.
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
| |
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
| |
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
|
|
| |
With latest mesa and latest piglit tests srgb<->linear conversion
is not required as per GL4.4 rules
See commit b662c70aeab6a92751514f30719c13a6de253b40.
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
VC4 was running into a major performance regression from enabling control
flow in the glmark2 conditionals test, because of short if statements
containing an ffract.
This pass seems like it was was trying to ensure that we only flattened
IFs that should be entirely a win by guaranteeing that there would be
fewer bcsels than there were MOVs otherwise. However, if the number of
ALU ops is small, we can avoid the overhead of branching (which itself
costs cycles) and still get a win, even if it means moving real
instructions out of the THEN/ELSE blocks.
For now, just turn on aggressive flattening on vc4. i965 will need some
tuning to avoid regressions. It does looks like this may be useful to
replace freedreno code.
Improves glmark2 -b conditionals:fragment-steps=5:vertex-steps=0 from 47
fps to 95 fps on vc4.
vc4 shader-db:
total instructions in shared programs: 101282 -> 99543 (-1.72%)
instructions in affected programs: 17365 -> 15626 (-10.01%)
total uniforms in shared programs: 31295 -> 31172 (-0.39%)
uniforms in affected programs: 3580 -> 3457 (-3.44%)
total estimated cycles in shared programs: 225182 -> 223746 (-0.64%)
estimated cycles in affected programs: 26085 -> 24649 (-5.51%)
v2: Update shader-db output.
Reviewed-by: Ian Romanick <[email protected]> (v1)
|
|
|
|
|
|
| |
Get rid of unneeded local var.
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
| |
Never used, AFAIK.
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
v2:
- no PIPE_CAP_INT64 yet
- emit DIV/MOD without the divide-by-zero workaround
Reviewed-by: Marek Olšák <[email protected]> (v1)
Reviewed-by: Edward O'Callaghan <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
Signed-off-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Leo Liu <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
| |
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
| |
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
| |
Switch all RDTSC_START/STOP macros to use AR_BEGIN/END macros.
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
| |
Same thing as nvc0_stage_set_sampler_views_range().
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
This function was quite similar to nvc0_stage_set_sampler_views()
and I don't see any reasons to not remove it.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This helps shaders in UE4 demos, especially with Elemental
(+1% perf). This optimization reduces spilling usage in one
shader which explains the little gain.
GF100/GK104:
total instructions in shared programs :2838551 -> 2838045 (-0.02%)
total gprs used in shared programs :396706 -> 396684 (-0.01%)
total local used in shared programs :34432 -> 34416 (-0.05%)
local gpr inst bytes
helped 1 19 112 112
hurt 0 0 0 0
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This should emit src0 instead of src1.
Found by inspection.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
| |
A couple of forward-declarations were causing warnings in clang:
'value' defined as a class here but previously declared as a struct
[-Wmismatched-tags]
Signed-off-by: Martina Kollarova <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch relaxes the restriction of compressed formats for texture
upload buffer. For now, 3D texture with compressed format
is still not supported in the texture upload buffer path.
As Brian noted, ETQW does many texture updates with glCompressedTexSubImage.
This patch greatly improves the performance of the ETQW trace.
Tested with ETQW, MTT piglit, glretrace, conform, viewperf
v2: Per Brian's suggestion, removed the subregion boundary check.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This reduces the number of times we flush in some situations (the
arbocclude demo is one trivial example).
Tested with Piglit, ETQW, Sauerbraten, arbocclude.
Reviewed-by: Charmaine Lee <[email protected]>
|