| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
This gives a 2.7x improvement in x11perf -rect100, since we only end up
load/storing the x11perf window, not the whole screen.
|
|
|
|
|
|
| |
This will be more important in the next commit, when there's more state to
reset to nonzero values, and I want an early exit from the submit
function.
|
|
|
|
|
| |
The callers all follow it with a flush of the context, and the flush of
the context gives us more information about how things are being flushed.
|
|
|
|
|
| |
* Drop no longer needed mesa headers
* Haiku LLVM pipe working with LLVM 3.5.0 on x86_64
|
| |
|
|
|
|
|
|
|
| |
As of 229bf4475ff0a5dbeb9bc95250f7a40a983c2e28 we started getting SIBGUS
from unaligned accesses on the hardware, for reasons I haven't figured
out. However, we should be avoiding unaligned accesses anyway, and our CL
setup certainly would have produced them.
|
|
|
|
|
| |
They should all be set to real values by the time they're read, and
ideally if you used valgrind you'd see uninitialized value uses.
|
|
|
|
| |
It doesn't matter, since it just got truncated to 16 inside, anyway.
|
|
|
|
|
|
|
|
|
|
|
| |
E.g. this could happen on older kernels which don't support the
RADEON_INFO_SI_BACKEND_ENABLED_MASK query yet. The code in
si_write_harvested_raster_configs() doesn't deal with this correctly and
would probably mangle the value badly.
Cc: "10.4 10.3" <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
| |
The optimizer obviously doesn't have the ability to rewrite these to skip
the size checks per call, so we have to do it manually.
Improves a norast benchmark on simulation by 0.779706% +/- 0.405838%
(n=6087).
|
|
|
|
|
| |
Improves norast performance of a microbenchmark by 11.1865% +/- 2.37673%
(n=20).
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Some compile time RA debug
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
* This is the cleaned up work of the Haiku GCI student
Adrián Arroyo Calle [email protected]
* Several patches were consolidated to prevent
unnecessary touching of non-related code
|
|
|
|
|
|
|
|
|
| |
This fixes incorrect rendering in Unreal Engine demos.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83510
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
| |
Signed-off-by: David Heidelberg <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Same as ARL, just has extra rounding.
Useful for st/nine.
Tested-by: Pavel Ondračka <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: David Heidelberg <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
| |
|
|
|
|
|
|
|
| |
trans_kill() only handles the single opcode. Drop the remnant of a time
when both KILL and KILL_IF were handled by the same fxn.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Standalone compiler doesn't have screen or context. We need to come up
with a better way to control the target arch (ie. something that we can
control from cmdline w/ standalone compiler) but for now this hack keeps
it from segfault'ing.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
| |
total instructions in shared programs: 41168 -> 40976 (-0.47%)
instructions in affected programs: 18156 -> 17964 (-1.06%)
|
|
|
|
|
| |
This will let me coalesce the VPM writes into the instructions generating
the values.
|
|
|
|
|
|
| |
Signed-off-by: Timothy Arceri <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Small immediates have the downside of taking over the raddr B field, so
you might have less chance to pack instructions together thanks to raddr B
conflicts. However, it also reduces some register pressure since it lets
you load 2 "uniform" values in one instruction (avoiding a previous load
of the constant value to a register), and increases some pairing for the
same reason.
total uniforms in shared programs: 16231 -> 13374 (-17.60%)
uniforms in affected programs: 10280 -> 7423 (-27.79%)
total instructions in shared programs: 40795 -> 41168 (0.91%)
instructions in affected programs: 25551 -> 25924 (1.46%)
In a previous version of this patch I had a reduction in instruction count
by forcing the other args alongside a SMALL_IMM to be in the A file or
accumulators, but that increases register pressure and had a bug in
handling FRAG_Z. In this patch is I just use raddr conflict resolution,
which is more expensive. I think I'd rather tweak allocation to have some
way to slightly prefer good choices for files in general, rather than risk
failing to register allocate by forcing things into register classes.
|
|
|
|
| |
I want this from other passes.
|
| |
|
|
|
|
| |
$(RM) includes -f.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Since our kernel BOs require CMA allocation, and the use of them requires
new mmaps, it's pretty expensive and we should avoid it if possible.
Copying my original design for Intel, make a userspace cache that reuses
BOs that haven't been shared to other processes but frees BOs that have
sat in the cache for over a second.
Improves glxgears framerate on RPi by around 30%.
|
|
|
|
|
|
| |
This gets DRI3 working on modesetting with glamor. It's not enabled under
simulation, because it looks like handing our dumb-allocated buffers off
to the server doesn't actually work for the server's rendering.
|
| |
|
|
|
|
|
|
|
|
|
| |
This reverts db3dfcfe90a3d27e6020e0d3642f8ab0330e57be.
The commit was correct but we've got some precision problems later in
llvmpipe (or possibly in draw clip) due to the vertices coming in in
different order, causing some internal test failures. So revert for now.
(Will only affect drivers which actually support constant-interpolated
attributes and not just flatshading.)
|
|
|
|
|
| |
total instructions in shared programs: 43053 -> 40795 (-5.24%)
instructions in affected programs: 37996 -> 35738 (-5.94%)
|
| |
|
|
|
|
| |
We're deciding about the WS bit, not PM.
|
|
|
|
|
|
|
| |
Signed-off-by: Timothy Arceri <[email protected]>
Reviewed-By: Jose Fonseca <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
This is the same basic logic from the original Broadcom driver.
|
|
|
|
|
|
| |
Commit ade8b26bf missed adding this cap to nvc0.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
| |
This fixes 4 vertexid related piglit tests with llvmpipe due to switching
behavior of vertexid to the one gl expects.
(Won't fix non-llvm draw path since we don't get the basevertex currently.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Plus a new PIPE_CAP_VERTEXID_NOBASE query. The idea is that drivers not
supporting vertex ids with base vertex offset applied (so, only support
d3d10-style vertex ids) will get such a d3d10-style vertex id instead -
with the caveat they'll also need to handle the basevertex system value
too (this follows what core mesa already does).
Additionally, this is also useful for other state trackers (for instance
llvmpipe / draw right now implement the d3d10 behavior on purpose, but
with different semantics it can just do both).
Doesn't do anything yet.
And fix up the docs wrt similar values.
v2: incorporate feedback from Brian and others, better names, better docs.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
r600, rv610 and rv630 all have a bug in their GPR indexing
and how the hw inserts access to PV.
If the base index for the src is the same as the dst gpr
in a previous group, then it will use PV instead of using
the indexed gpr correctly.
The workaround is to insert a NOP when you detect this.
v2: add second part of fix detecting DST rel writes followed
by same src base index reads.
v3: forget adding stuff to structs, just iterate over the
previous node group again, makes it more obvious.
v3.1: drop local_nop.
Fixes ~200 piglit regressions on rv635 since SB was introduced.
Reviewed-By: Glenn Kennard <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
| |
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
This reverts commit 7b0067d23a6f64cf83c42e7f11b2cd4100c569fe.
Vadim's patch fixes this a lot better.
|
|
|
|
|
| |
32-bit unsigned would require some adjustments to handle values >=
0x80000000.
|
| |
|