| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
DONTBLOCK is correctly handled in r600_bo_map.
|
| |
|
| |
|
| |
|
|
|
|
|
| |
Broken by the dependency on ralloc introduced by
fe622bac0c1b5b9f2a9fcf9f35b51232a06bea42
|
|
|
|
|
| |
Tested-by: Christopher Egert <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
The new allocator uses ra and does swizzle packing.
Also, a data structure (struct rc_variable) and associated functions have
been added for generating UD and DU chains.
|
|
|
|
|
|
|
|
| |
This function can be used to avoid creating single register classes for
input/payload registers. This makes optimistic coloring less likely
to fail.
Reviewed-by: Eric Anholt <[email protected]>
|
| |
|
| |
|
|
|
|
|
| |
For pair instructions we need a reference to both the arg
and source.
|
|
|
|
|
|
|
|
| |
The instruction scheduler will sometimes leave orphaned sources when
converting instructions from RGB to Alpha. If one of these orphaned
sources has an index greater than the maximum temporary register index,
then the compiler will incorrectly report "Too many hardware temporaries
used". The dead sources pass cleans up these orphaned sources.
|
|
|
|
|
|
| |
Tested with softpipe and llvmpipe.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
| |
GL_FIXED should not be accepted in the other gl*Pointer calls in OpenGL.
There is a new piglit for this: arb_es2_compatibility-fixed-type.
Reviewed-by: Brian Paul <[email protected]>
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
We were accidentally leaving blending enabled for LogicOp GL_COPY,
which ARB_color_buffer_float/GL_RGBA32F-render (and friends) caught.
Additionally, the GL spec says that no LogicOp should be done to
floating-point targets, and the GPU gets really angry even if you say
to LogicOp GL_COPY to float.
|
|
|
|
|
|
|
|
|
|
| |
As we expanded the usage of the state cache, it grew extra
functionality. However, with the recent state streaming rework, we're
back to the state cache being used only for shader kernels, which is
the piece of GPU state that's actually expensive to compute again from
scratch, since it involves compiling.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
It was moved to state streaming a while back and this was left over.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Now that all the dynamic state is streamed through the top of the
batchbuffer, we can cut out many of our relocations to that state by
using the base address.
Improves 3DMMES taiji performance 3.3% +/- 0.4% (n=15).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Overall, across this series since the last set of numbers, gen6 3DMMES
taiji performance has dropped 0.8% +/- 0.3% (n=15), probably due to
the increased reissuing of state from some of the state objects that
otherwise never changed, and increased occurrence of the per-batch
overhead as we've increased how much we put in the batch BO without
increasing the batch BO's size.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
The samplers are about to become streamed for gen6 performance, which
would cause this unit to blow out the state cache.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This is in a way a revert of f5bb775fd1f333d8e579d07a5cac1ded2bd54a2f.
The tiny win that had will be overwhelmed by the win of using the gen6
dynamic state base address.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Improves 3DMMES taiji demo performance by 10.1% +/- 0.9% (n=15).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Improves 3DMMES taiji demo performance by 5.1% +/- 1.9% (n=15), by
reducing CPU time spent thrashing around those tiny little constant BOs.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
This helps clarify profiling results.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The payload regs can go all the way up to register 60+, so just give
them 8 bits to be addressed by instead of 3-4 (which made source_w_reg
of 8 end up 0). There's no reason to aggressively pack these fields,
as they are just used as compiler information, where being easier to
access is probably more important than shaving a byte or two off of
the structure.
Fixes piglit fragcoord_w.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36649
|