| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Jerome Glisse <[email protected]>
|
|
|
|
|
|
| |
It's nice to see so much code that did pretty much nothing go away.
Reviewed-by: Jerome Glisse <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Any driver can implement this simple and efficient optimization.
Team Fortress 2 hits it always. The DISCARD_RANGE codepath is not even used
with TF2 anymore, so we avoid a ton of useless buffer copies.
Tested-by: Andreas Boll <[email protected]>
NOTE: This is a candidate for the 9.1 branch.
|
|
|
|
|
|
| |
Tested-by: Andreas Boll <[email protected]>
NOTE: This is a candidate for the 9.1 branch.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This doesn't fix any issue we know of, but there indeed is a week spot
in draw_vbo where streamout can fail. After streamout is enabled,
the need_cs_space call can flush the context, which causes the streamout
to be disabled right after it was enabled and bad things happen.
One way to fix it is to atomize the beginning part, so that no context flush
can happen between streamout enabling and the first drawing.
Tested-by: Andreas Boll <[email protected]>
|
|
|
|
|
|
|
|
|
| |
PS_PARTIAL flushes seems to be required in certain
cases to prevent hangs, especially on r6xx.
Note: this is a candidate for the 9.1 branch.
Signed-off-by: Alex Deucher <[email protected]>
|
|
|
|
| |
Signed-off-by: Jerome Glisse <[email protected]>
|
|
|
|
|
|
| |
v2: Add virtual address to dma src/dst offset for cayman
Signed-off-by: Jerome Glisse <[email protected]>
|
| |
|
|
|
|
|
|
|
| |
Need to add the virtual address.
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
R6xx doesn't work - the issue seems to be with flushing (sometimes
the destination buffer contains garbage). There are no hangs, so we're good.
R7xx doesn't seem to have any alignment restriction despite our initial
thinking. Everything just works.
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
| |
Upcoming async dma support rely on winsys knowing about GPU families.
Signed-off-by: Jerome Glisse <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
because that's what it does.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This bring r600g allmost inline with closed source driver when
it comes to flushing and synchronization pattern.
v2-v4: history lost somewhere in outer space
v5: Fix compute size of flushing, use define for flags, update
worst case cs size requirement for flush, treat rs780 and
newer as r7xx when it comes to streamout.
v6: Fix num dw computation for framebuffer state, remove dead
code, use define instead of hardcoded value.
v7: Remove dead code
Signed-off-by: Jerome Glisse <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Instead of having a 4-byte buffer for each streamout target, we suballocate
each dword from a 4K buffer.
This further reduces the overall number of relocations.
Tested-by: Aaron Watry <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
| |
This fixes rare graphical corruption.
NOTE: This is a candidate for the stable branches.
|
|
|
|
|
|
|
|
|
| |
v2: Group vgt register together to avoid lockup
v3: Split multi primitive register and index bias register
v4: Bump R600_NUM_ATOMS
Signed-off-by: Marek Olšák <[email protected]>
Signed-off-by: Jerome Glisse <[email protected]>
|
|
|
|
|
|
| |
by reusing the CS initialization in r600_context_flush.
Reviewed-by: Jerome Glisse <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Based on the patch called "simplify and fix flushing and synchronization"
by Jerome Glisse.
Rebased, removed unneded code, simplified more and cleaned up.
Also, SH_ACTION_ENA is not set when changing shaders (hw doesn't seem
to need it). It's only used to flush constant buffers.
Reviewed-by: Jerome Glisse <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Use atom for sampler state. Does not provide new functionality
or fix any bug. Just a step toward full atom base r600g.
v2: Split seamless on r6xx/r7xx into it's own atom. Make sure it's
emited after sampler and with a pipeline flush before otherwise
it does not take effect.
Signed-off-by: Jerome Glisse <[email protected]>
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
This function is used when dispatching compute shader in order to avoid
mixing compute and 3D registers in the context's dirty list. This
allows the compute code to resuse 3D functions like evergreen_cb, which
return a struct r600_pipe_state and still have control over when and how
the register writes are emitted.
|
|
|
|
|
|
|
| |
This allows the shader type bit to be set in the pm4 header when
emitting registers for compute shaders.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This couldn't be split because it would break bisecting.
Summary:
* r300g,r600g: stop using u_vbuf
* r300g,r600g: also report that the FIXED vertex type is unsupported
* u_vbuf: refactor for use in the state tracker
* cso: wire up u_vbuf with cso_context
* st/mesa: conditionally install u_vbuf
|
| |
|
|
|
|
|
|
|
|
|
| |
This shaves 2k off the final dri.so, and removes lots of pointless
NULL, 0 passing.
most like pointless - but it looked nicer to me.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
| |
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reduces a little of CPU overhead.
The idea is to translate pipe vertex buffers directly into the CS
and not using any intermediate representations.
Framerate in Torcs:
before: 32.2
after: 34.6
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
| |
Finally, union r600_query_result can be removed.
|
|
|
|
|
|
| |
Note: this is a candidate for the stable branches.
Signed-off-by: Alex Deucher <[email protected]>
|
| |
|
|
|
|
|
|
|
|
| |
And rename or inline functions where appropriate.
There is no reason to keep this stuff in r600_hw_context.c.
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We always mapped the query buffer in begin_query, causing stalls
if the buffer was busy.
This commit reworks it such that the query buffer is only mapped
in get_query_result as it's supposed to be.
The query buffer is no longer treated as a ring buffer. Instead, the results
are just appended and when the buffer is full, we create a new one. One query
can have more than one query buffer, though that's a very rare case.
Begin_query releases all query buffers.
Reviewed-by: Jerome Glisse <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Jerome Glisse <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a state which is derived from other states and is actually the first
state which doesn't correspond to any gallium state.
There are two state flags:
bool occlusion_query_enabled
bool flush_depthstencil_enabled
Additional flags can be added later if needed, e.g. bool hiz_enabled.
The emit function will have to figure out the register values by itself.
It basically just emits the registers when the state changes.
This commit also adds a few helper functions for writing registers directly
into a command stream.
|
|
|
|
|
| |
The code was almost the same for r600 and eg. What can't be consolidated is
in the *_prepare functions.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This also significantly improves the RV670 flush by using the CB1 flush
*always* and also DEST_BASE_0_ENA, which appears to magically fix some tests.
I am not entirely sure, but it's possible that RV670 flushing is fixed
completely.
v2: fix cayman by flushing texture cache instead of vertex cache
Thanks to Dave Airlie for testing Cayman.
|
| |
|
| |
|
|
|
|
| |
The split made no sense.
|
| |
|
|
|
|
|
| |
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
| |
Just getting rid of things which use the register mask.
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are 3 changes:
1) stride is specified for each buffer, not just one, so that drivers don't
have to derive it from the outputs
2) new per-output property dst_offset, which specifies the offset
into the buffer in dwords where the output should be stored,
so that drivers don't have to compute the offsets manually;
this will also be useful for gl_SkipComponents
from ARB_transform_feedback3
3) register_mask is removed, instead, there is start_component
and num_components; register_mask with non-consecutive 1s
doesn't make much sense (some hardware cannot do packing of components)
Christoph Bumiller: fixed nvc0.
v2: resolve merge conflicts in Draw and clean it up
|