| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Improves performance of a hacked-up scissor-many (to reuse a small set
of scissors instead of blowing out the cache, and then to run 100x
more iterations so it actually took some time) by 3.6% +/- 1.2% (n=10)
|
|
|
|
| |
Signed-off-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
With relaxed relocation checking in the kernel, we can specify a
negative delta (i.e. pointing outside of the target bo) in order to fake
a range in a large buffer. We only then need to upload the elements used
and adjust the buffer offset such that they correspond with the indices
used in the DrawArrays.
(Depends on libdrm 0209428b3918c4336018da9293cdcbf7f8fedfb6)
Signed-off-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
... and take advantage of start_vertex_bias to trim to [min_index,
max_index] where possible (i.e. when we need to upload all arrays).
Fixes half_float_vertex(misc.fillmode.wireframe)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34595
Signed-off-by: Chris Wilson <[email protected]>
|
|
|
|
| |
Signed-off-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
|
|
| |
If the next vertex arrays are a (discontiguous) continuation of the
current arrays, such that the new vertices are simply offset from the
start of the current vertex buffer definitions we can reuse those
defintions and avoid the overhead of relocations and invalidations.
Signed-off-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
| |
Track reuse of the vertex buffer objects and so minimise the number of
vertex buffers used by the hardware (and their relocations).
Signed-off-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
|
| |
As we now pack the indices into a common upload buffer, we can reuse a
single CMD_INDEX_BUFFER packet and translate each invocation with a
start vertex offset.
Signed-off-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
| |
As we use state relocations and we know that all the state belongs to
the same bo, we can drop the multiple references to the same bo.
Signed-off-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
| |
Reuse the new common upload buffer for uploading temporary indices and
rebuilt vertex arrays.
Signed-off-by: Chris Wilson <[email protected]>
|
|
|
|
|
| |
These were incorrectly defined to the same value - likely due to a cut
and paste error. Found by inspection.
|
|
|
|
| |
We pull the draw regions right out of the renderbuffers these days.
|
|
|
|
|
|
|
|
|
|
| |
It was only used for gen6 fragment programs (not GLSL shaders) at this
point, and it was clearly unsuited to the task -- missing opcodes,
corrupted texturing, and assertion failures hit various applications
of all sorts. It was easier to patch up the non-glsl for remaining
gen6 changes than to make brw_wm_glsl.c complete.
Bug #30530
|
|
|
|
|
|
|
|
| |
This fixes some insanity that would otherwise be required for GLSL
1.30 bit ops or gen6 integer uniform operations in general, at the
cost of upload-time pain. Given that we only have that pain because
mesa's mangling our integer uniforms to be floats, this something that
should be fixed outside of the shader codegen.
|
|
|
|
| |
Fixes glsl-fs-uniform-array-5, but not 6 which fails in ir_to_mesa.
|
| |
|
| |
|
|
|
|
| |
Fixes glsl-fs-texturecube-2-*
|
| |
|
| |
|
|
|
|
| |
Fixes glsl-algebraic-add-add-1.
|
|
|
|
|
|
|
|
| |
before:
[ # ] backend test min(s) median(s) stddev. count
[ 0] gl firefox-talos-gfx 31.791 32.287 1.11% 6/6
after:
[ 0] gl firefox-talos-gfx 31.198 31.675 0.96% 6/6
|
|
|
|
|
|
|
|
|
|
|
| |
It turns out that computing a 56 byte key to look up a 20-byte object
out of a hash table was some sort of a bad idea. Whoops.
before:
[ # ] backend test min(s) median(s) stddev. count
[ 0] gl firefox-talos-gfx 37.799 38.203 0.39% 6/6
after:
[ 0] gl firefox-talos-gfx 34.761 34.784 0.17% 5/6
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This slightly reduces reduces cairo-gl firefox-talos-gfx runtime on my
Ironlake:
before:
[ # ] backend test min(s) median(s) stddev. count
[ 0] gl firefox-talos-gfx 38.236 38.383 0.43% 5/6
after:
[ 0] gl firefox-talos-gfx 37.799 38.203 0.39% 6/6
It turns out the cost of caching these objects and looking them up in
the cache again is greater than the cost of just computing the object
again, particularly when the overhead of having a separate BO to pin
is removed.
(Those that are paying close attention will note that this is a
reversal of the path I was moving the driver in a couple of years ago.
The major thing that has changed is that back then all state was
recomputed when we wrapped the streaming state buffer, including
recompiling our precious programs. Now, we're uncaching just the
objects that are cheap to compute, and retaining caching of expensive
objects)
|
|
|
|
| |
This was bothering me when redoing the binding tables.
|
|
|
|
|
|
|
|
| |
The cache lookup of these two little floats was .12% of total CPU time
on firefox-talos-gfx because we did it any time commonly-changed state
changed. On the other hand, updating the CC VP bo immediately whenver
CC VP state changes is a .07% overhead due to putting a driver hoook
in glEnable().
|
|
|
|
|
|
|
| |
In exchange we end up with an extra memcpy, but that seems better than
calloc/free. Each buffer is 4k maximum, and on the i965-streaming
branch this allocation was showing up as the top entry in
brw_validate_state profiling for cairo-gl.
|
|
|
|
|
| |
The slightly less mechanical change of converting the emit_reloc calls
will follow.
|
|
|
|
|
| |
We should be able to do 16, but are limited by Mesa's static buffer
allocations.
|
|
|
|
|
| |
GL doesn't actually let you begin an OQ while one is active, so the
extra work was pointless.
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
Saves an instruction in PINTERP, LINTERP, and PIXEL_W from
brw_wm_glsl.c For non-GLSL it isn't used yet because the deltas have
to be laid out differently.
|
| |
|
|
|
|
| |
even with vs disabled, still doesn't work.
|
| |
|
|
|
|
|
|
| |
For the tiny bis of data we generally upload through the CURBEs, the
overhead of the kernel's pagetable trickery is actually rather high.
This improves cairo-gl gnome-terminal-vim performance by 3.8%.
|
| |
|
|
|
|
|
|
|
| |
The pull constants require sending out to an overworked shared unit
and waiting for a response, while push constants are nicely loaded in
for us at thread dispatch time. By putting things we access in every
VS invocation there, ETQW performance improved by 2.5% +/- 1.6% (n=6).
|
|
|
|
|
|
| |
Everything has been constant-sized until now, but constant buffer
handling changes will make us want some additional variable sized
array.
|
|
|
|
|
|
|
|
|
| |
As part of the DRI driver interface rewrite I merged __DRIscreenPrivate
and __DRIscreen, and likewise for __DRIdrawablePrivate and
__DRIcontextPrivate. I left typedefs in place though, to avoid renaming
all the *Private use internal to the driver. That was probably a
mistake, and it turns out a one-line find+sed combo can do the mass
rename. Better late than never.
|
|
|
|
| |
Saves another 600 bytes or so of code.
|
|
|
|
| |
Saves ~2KB of code.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a GLbitfield64 type and several macros to operate on 64-bit
fields. The OutputsWritten field of gl_program is changed to use that
type. This results in a fair amount of fallout in drivers that use
programs.
No changes are strictly necessary at this point as all bits used are
below the 32-bit boundary. Fairly soon several bits will be added for
clip distances written by a vertex shader. This will cause several
bits used for varyings to be pushed above the 32-bit boundary. This
will affect any drivers that support GLSL.
At this point, only the i965 driver has been modified to support this
eventuality.
I did this as a "squash" merge. There were several places through the
outputswritten64 branch where things were broken. I foresee this
causing difficulties later for bisecting. The history is still
available in the branch.
Conflicts:
src/mesa/drivers/dri/i965/brw_wm.h
|
| |
|
| |
|
|
|
|
|
| |
This keeps the individual state files from having to export their
structures for brw_state_cache initialization.
|