| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The motivation behind this rework is to get some speed by reducing
CPU overhead. The performance increase depends on many factors,
but it's measurable (I think it's about 10% increase in Torcs).
This commit replaces libdrm's radeon_cs_gem with our own implemention.
It's optimized specifically for r300g, but r600g could use it as well.
Reloc writes and space checking are faster and simpler than their
counterparts in libdrm (the time complexity of all the functions
is O(1) in nearly all scenarios, thanks to hashing).
(libdrm's radeon_bo_gem is still being used in the driver.)
It works like this:
cs_add_reloc(cs, buf, read_domain, write_domain) adds a new relocation and
also adds the size of 'buf' to the used_gart and used_vram winsys variables
based on the domains, which are simply or'd for the accounting purposes.
The adding is skipped if the reloc is already present in the list, but it
accounts any newly-referenced domains.
cs_validate is then called, which just checks:
used_vram/gart < vram/gart_size * 0.8
The 0.8 number allows for some memory fragmentation. If the validation
fails, the pipe driver flushes CS and tries do the validation again,
i.e. it validates only that one operation. If it fails again, it drops
the operation on the floor and prints some nasty message to stderr.
cs_write_reloc(cs, buf) just writes a reloc that has been added using
cs_add_reloc. The read_domain and write_domain parameters have been removed,
because we already specify them in cs_add_reloc.
The space checking has been tested by putting small values in vram/gart_size
variables.
|
|
|
|
|
|
| |
There really shouldn't be any difference between the two for us.
Fixes a bug where Z16 renderbuffers would be untiled on gen6, likely
leading to hangs.
|
| |
|
|
|
|
| |
(cherry picked from commit 67aeab0b77fb6be864088e69ea74a010b6543fa1)
|
|
|
|
| |
(cherry picked from commit 36009724fdd652ab29aa928ba78891afd650e768)
|
| |
|
|
|
|
| |
Fixes an abort in fbo-generatemipmap-formats.
|
|
|
|
|
|
|
| |
By relying on just intel_span_supports_format, some formats that
aren't supported pre-gen4 were not reporting FBO incomplete. And we
also complained in stderr when it happened on i915 because draw_region
gets called before framebuffer completeness validation.
|
|
|
|
|
|
|
|
|
|
|
| |
ARB_fbo no longer disallows mismatched width/height on attachments
(shouldn't be any problem), mixed format color attachments (we only
support 1), and L/A/LA/I color attachments (we already reject them on
965 too). It requires Gen'ed names (driver doesn't care), and adds
FramebufferTextureLayer (we don't do texture arrays). So it looks
like we're already in the position we need to be for this extension.
Bug #27468, #32381.
|
|
|
|
|
| |
Occurred because the code assumed that buf->domain would remain
equal to old_domain.
|
|
|
|
| |
There are actually applications that profit immensely from this.
|
| |
|
| |
|
|
|
|
|
| |
... instead of the next fence to be emitted. This way we have a
chance to reclaim the storage earlier.
|
|
|
|
|
|
|
| |
Invalid after be1af4394e060677b7db6bbb8e3301e38a3363da (user buffer
creation with width0 == ~0).
Signed-off-by: Marek Olšák <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In general, we have to negate in immediate values we pass in because
the src1 negate field in the register description is in the bits3 slot
that the 32-bit value is loaded into, so it's ignored by the hardware.
However, the src0 negate field is in bits1, so after we'd negated the
immediate value loaded in, it would also get negated through the
register description. This broke this VP instruction in the position
calculation in civ4:
MAD TEMP[1], TEMP[1], CONST[256].zzzz, CONST[256].-y-y-y-y;
Bug #30156
|
| |
|
|
|
|
| |
r600_upload_const_buffer().
|
| |
|
| |
|
|
|
|
| |
r600_bc_add_alu_type().
|
| |
|
|
|
|
|
|
|
| |
This only uploads the [min_index, max_index] range instead of [0, userbuf size],
which greatly speeds up user buffer uploads.
This is also a prerequisite for atomizing vertex arrays in st/mesa.
|
|
|
|
|
|
|
|
|
|
| |
Fix the error in uniform row calculating, it may alloc one line
more which may cause out of range on memory usage, sometimes program
aborted when free the memory.
NOTE: This is a candidate for 7.9 and 7.10 branches.
Signed-off-by: Brian Paul <[email protected]>
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
This provides an upload facility for the constant buffers since Marek's
constants in user buffers changes.
gears at least work on my evergreen now.
Signed-off-by: Dave Airlie <[email protected]>
|
| |
|
|
|
|
| |
This adds support for Barts, Turks, and Caicos asics.
|
|
|
|
|
|
|
|
| |
This should make it easier to cross-reference the code and hardware
documentation, as well as clear up any confusion on whether constants
like CMD_3D_WM_STATE mean WM_STATE (pre-gen6) or 3DSTATE_WM (gen6+).
This does not rename any pre-gen6 defines.
|
|
|
|
|
|
|
| |
The draw module set new state that didn't require swtnl which caused need_swtnl to
be unset. This caused the call from to svga_update_state(svga, SVGA_STATE_SWTNL_DRAW)
from the vbuf backend to overwrite the vdecls we setup there to be overwritten with
the real buffers vdecls.
|
|
|
|
| |
For the previous commit.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the 'STDGL invariant(all)' pragma added in GLSL 1.20 was
simply ignored by the compiler. This adds support for setting all
variable invariant.
In GLSL 1.10 and GLSL ES 1.00 the pragma is ignored, per the specs,
but a warning is generated.
Fixes piglit test glsl-invariant-pragma and bugzilla #31925.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GLSL 1.10 and 1.20 allow any sort of sampler array indexing.
Restrictions were added in GLSL 1.30. Commit f0f2ec4d added support
for the 1.30 restrictions, but it broke some valid 1.10/1.20 shaders.
This changes the error to a warning in GLSL 1.10, GLSL 1.20, and GLSL
ES 1.00.
There are some spurious whitespace changes in this commit. I changed
the layout (and wording) of the error message so that all three cases
would be similar. The 1.10/1.20 and 1.30 text is the same. The only
difference is that one is an error, and the other is a warning. The
GLSL ES 1.00 wording is similar but not quite the same.
Fixes piglit test
spec/glsl-1.10/compiler/constant-expressions/sampler-array-index-02.frag
and bugzilla #32374.
|
|
|
|
| |
https://bugs.freedesktop.org/show_bug.cgi?id=32634
|
|
|
|
|
|
| |
It's no longer needed because the upload buffer remains mapped while the CS
is being filled (openarena, ut2004 and others that this code was for do not
use VBOs by default).
|
|
|
|
| |
because the upload buffers are reused for subsequent draw operations.
|
| |
|
|
|
|
| |
It's called in vbo_exec_invalidate_state too.
|
|
|
|
| |
What were these for?
|
|
|
|
|
| |
I also specified the array sizes in the header so that one can use
the Elements macro on it.
|
|
|
|
| |
So that a state tracker can unreference them after set_vertex_buffers.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The overhead of resource_create, transfer_inline_write, and resource_destroy
to upload constant data is very visible with some apps in sysprof, and
as such should be eliminated.
My approach uses a user buffer to pass a pointer to a driver. This gives
the driver the freedom it needs to take the fast path, which may differ
for each driver.
This commit addresses the same issue as Jakob's one that suballocates out
of a big constant buffer, but it also eliminates the copy to the buffer.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Added a parameter to specify a minimum offset that should be returned.
r300g needs this to better implement user buffer uploads. This weird
requirement comes from the fact that the Radeon DRM doesn't support negative
offsets.
- Added a parameter to notify a driver that the upload flush occured.
A driver may skip buffer validation if there was no flush, resulting
in a better performance.
- Added a new upload function that returns a pointer to the upload buffer
directly, so that the buffer can be filled e.g. by the translate module.
|
|
|
|
|
|
|
| |
The map/unmap overhead can be significant even though there is no waiting on busy
buffers. There is simply a huge number of uploads.
This is a performance optimization for Torcs, a car racing game.
|
|
|
|
|
|
|
|
| |
See http://bugs.freedesktop.org/show_bug.cgi?id=32859
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Signed-off-by: Brian Paul <[email protected]>
|
| |
|
|
|
|
|
| |
Include imports.h directly instead of indirectly through context.h.
version.c does use any symbols that are added by context.h.
|