| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
Performance++.
|
|
|
|
|
|
|
|
|
|
| |
https://bugs.freedesktop.org/show_bug.cgi?id=32912
The fix is to call update_derived_state before user buffer uploads.
I've also moved some code around.
Unfortunately, there are still some ZMASK-related bugs which cause
misrendering, i.e. flushing doesn't always work and glean/fbo fails.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The motivation behind this rework is to get some speed by reducing
CPU overhead. The performance increase depends on many factors,
but it's measurable (I think it's about 10% increase in Torcs).
This commit replaces libdrm's radeon_cs_gem with our own implemention.
It's optimized specifically for r300g, but r600g could use it as well.
Reloc writes and space checking are faster and simpler than their
counterparts in libdrm (the time complexity of all the functions
is O(1) in nearly all scenarios, thanks to hashing).
(libdrm's radeon_bo_gem is still being used in the driver.)
It works like this:
cs_add_reloc(cs, buf, read_domain, write_domain) adds a new relocation and
also adds the size of 'buf' to the used_gart and used_vram winsys variables
based on the domains, which are simply or'd for the accounting purposes.
The adding is skipped if the reloc is already present in the list, but it
accounts any newly-referenced domains.
cs_validate is then called, which just checks:
used_vram/gart < vram/gart_size * 0.8
The 0.8 number allows for some memory fragmentation. If the validation
fails, the pipe driver flushes CS and tries do the validation again,
i.e. it validates only that one operation. If it fails again, it drops
the operation on the floor and prints some nasty message to stderr.
cs_write_reloc(cs, buf) just writes a reloc that has been added using
cs_add_reloc. The read_domain and write_domain parameters have been removed,
because we already specify them in cs_add_reloc.
The space checking has been tested by putting small values in vram/gart_size
variables.
|
|
|
|
|
|
|
| |
Invalid after be1af4394e060677b7db6bbb8e3301e38a3363da (user buffer
creation with width0 == ~0).
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
This only uploads the [min_index, max_index] range instead of [0, userbuf size],
which greatly speeds up user buffer uploads.
This is also a prerequisite for atomizing vertex arrays in st/mesa.
|
|
|
|
| |
https://bugs.freedesktop.org/show_bug.cgi?id=32634
|
|
|
|
|
|
| |
It's no longer needed because the upload buffer remains mapped while the CS
is being filled (openarena, ut2004 and others that this code was for do not
use VBOs by default).
|
|
|
|
| |
because the upload buffers are reused for subsequent draw operations.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Added a parameter to specify a minimum offset that should be returned.
r300g needs this to better implement user buffer uploads. This weird
requirement comes from the fact that the Radeon DRM doesn't support negative
offsets.
- Added a parameter to notify a driver that the upload flush occured.
A driver may skip buffer validation if there was no flush, resulting
in a better performance.
- Added a new upload function that returns a pointer to the upload buffer
directly, so that the buffer can be filled e.g. by the translate module.
|
| |
|
| |
|
|
|
|
|
|
| |
We shouldn't hit this bug in theory.
NOTE: This is a candidate for the 7.10 branch.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
Based on a patch from Drill <[email protected]>.
NOTE: This is a candidate for the 7.10 branch.
|
|
|
|
|
|
| |
still needs RADEON_HYPERZ=y env var.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Really no idea why I didn't see this before, but these values were opposite
the register spec.
this seems to fix rv530 HiZ on my laptop, will reenable in next commit.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
Thanks to Marek Olšák for making my initial attempt actually work.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
If 'start' is odd, render the first triangle with indices embedded
in the command stream, which adds 3 to 'start' and makes it even.
Then continue with the fast path.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Even though a bound texture stays bound when calling set_fragment_sampler_views,
it must be assigned a new cache region depending on the occupancy of other
texture units.
This fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=28800
Thanks to Álmos <[email protected]> for finding the bug in the code.
NOTE: This is a candidate for both the 7.9 and 7.10 branches.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The RS690 memory controller prefers things to be on a different
boundary than the discrete GPUs, we had an attempt to fix this,
but it still failed, this consolidates the stride calculation
into one place and removes the really special case check.
This fixes gnome-shell and 16 piglit tests on my rs690 system.
NOTE: This is a candidate for both the 7.9 and 7.10 branches.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The hardware apparently does support a zero stride, so let's use it.
This fixes missing objects in ETQW, but might also fix a ton of other
similar-looking bugs.
NOTE: This is a candidate for both the 7.9 and 7.10 branches.
|
| |
|
|
|
|
| |
This prevents needless buffer validation (CS space checking).
|
|
|
|
|
|
|
|
|
| |
It's not always possible to preprocess the content of 3D_LOAD_VBPNTR
in a command buffer, because the offset to all vertex buffers (which
the packet depends on) is derived from the "start" parameter of draw_arrays
and the "indexBias" parameter of draw_elements, but we can at least lazily
make a command buffer for the case when offset == 0, which should occur
most of the time.
|
|
|
|
| |
This also removes DBG_STATS (the stats can be obtained with valgrind instead).
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Finished up by Marek Olšák.
We can set the constant space to use a different area per-call to the shader,
we can avoid flushing the PVS as often as we do by spreading out the constants
across the whole constant space.
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
It appears to be a constant buffer index (in case there are more constant
buffers explicitly used by a shader), i.e. something that Gallium currently
does not use. We treated it incorrectly as the offset to a constant buffer.
|
| |
|
|
|
|
| |
.. instead of calling r500_index_bias_supported(..) every draw call.
|
| |
|
| |
|
|
|
|
|
| |
r600g might need something like that as well. This speeds up constant buffer
upload a bit.
|
|
|
|
|
|
|
| |
Small perf improvement in ipers.
radeon_drm_get_cs_handle is exactly what this commit tries to avoid
in every write_reloc.
|
| |
|
| |
|
|
|
|
| |
The other drivers need to be updated to do this, too.
|
| |
|
|
|
|
| |
Broken since 4c7001462607e6e99e474d6271dd481d3f8f201c.
|