| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
If the next vertex arrays are a (discontiguous) continuation of the
current arrays, such that the new vertices are simply offset from the
start of the current vertex buffer definitions we can reuse those
defintions and avoid the overhead of relocations and invalidations.
Signed-off-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
| |
Track reuse of the vertex buffer objects and so minimise the number of
vertex buffers used by the hardware (and their relocations).
Signed-off-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
It's faster. Not only is the memcpy more efficiently performed in the
kernel (making up for the system call overhead), but by not using mmap
we remove the greater overhead of tracking the vma of every batch.
And it means we can read back from the batch buffer without incurring
the cost of a uncached read through the GTT.
Signed-off-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
| |
Reuse the new common upload buffer for uploading temporary indices and
rebuilt vertex arrays.
Signed-off-by: Chris Wilson <[email protected]>
|
| |
|
|
|
|
|
|
|
| |
This provides the optimizer with hints about code hotness, which we're
quite certain about for debug printouts (or, rather, while we
developers often hit the checks for debug printouts, we don't care
about performance while doing so).
|
|
|
|
| |
This reverts commit 73dab75b4165f7d2214a68d4ba8e3cb7aab9b4ac.
|
| |
|
|
|
|
|
|
|
|
|
| |
There was a check to only do the rebase if we didn't have everything
in VBOs, but nexuiz apparently hands us a mix of VBOs and arrays,
resulting in blocking on the GPU to do a rebase.
Improves nexuiz 800x600, high-settings performance on my Ironlake 41%
(+/- 1.3%), from 14.0fps to 19.7fps.
|
|
|
|
| |
Until we fixed GS hang issue.
|
|
|
|
|
| |
It's more likely that we wrap badly in state setup than in the little
primitive packet.
|
|
|
|
|
| |
The slightly less mechanical change of converting the emit_reloc calls
will follow.
|
|
|
|
|
|
|
|
|
| |
This is similar to the GL_QUAD_STRIP -> TRIANGLE_STRIP optimization --
the GS usage to split the quads into tris is a huge bottleneck, so a
quick check improves glean blendFunc time massively (width * height of
the window of single-pixel GL_QUADS, many many times). This may also
end up helping with cairo performance, which sometimes ends up drawing
a single quad.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This uses a stamp mechanisms to mark the DRI drawable as invalid.
Instead of immediately updating the buffers we just bump the drawable
stamp and call out to DRI2GetBuffers "later".
"Later" used to be at LOCK_HARDWARE time, and this patch brings back
callouts at the points where we used to call LOCK_HARDWARE. A new function,
intel_prepare_render(), is called where we used to call LOCK_HARDWARE,
and if the buffers are invalid, we call out to DRI2GetBuffers there.
This lets us invalidate buffers only when notified instead of on
every glViewport() call. If the loader calls the DRI invalidate
entrypoint, we disable viewport triggered buffer invalidation.
Additionally, we can clean up the old viewport mechanism a bit,
since we can just invalidate the buffers and not worry about
reentrancy and whatnot.
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Conflicts:
src/mesa/drivers/dri/intel/intel_screen.c
src/mesa/drivers/dri/intel/intel_swapbuffers.c
src/mesa/drivers/dri/r300/r300_emit.c
src/mesa/drivers/dri/r300/r300_ioctl.c
src/mesa/drivers/dri/r300/r300_tex.c
src/mesa/drivers/dri/r300/r300_texstate.c
|
| | |
|
| | |
|
| | |
|
| | |
|
|/ |
|
|
|
|
|
|
| |
This should do all the things that MI_FLUSH did, but it can be pipelined
so that further rendering isn't blocked on the flush completion unless
necessary.
|
|\ |
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Conflicts:
Makefile
configs/default
progs/glsl/Makefile
src/gallium/auxiliary/util/u_simple_shaders.c
src/gallium/state_trackers/glx/xlib/xm_api.c
src/mesa/drivers/dri/i965/brw_draw_upload.c
src/mesa/drivers/dri/i965/brw_vs_emit.c
src/mesa/drivers/dri/intel/intel_context.h
src/mesa/drivers/dri/intel/intel_pixel.c
src/mesa/drivers/dri/intel/intel_pixel_read.c
src/mesa/main/texenvprogram.c
src/mesa/main/version.h
|
| | | |
|
| | |
| | |
| | |
| | |
| | | |
On the 965, we just drop the value into the primitive packet. On non-945,
we rely on the sw tnl code handling it.
|
|/ / |
|
| |
| |
| |
| | |
No performance difference proven at 95% confidence with my GLSL demo (n=10).
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This saves mapping the index buffer to get a bounds on the indices that
drivers just drop on the floor in the VBO case (cache win), saves a bonus
walk of the indices in the CheckArrayBounds case, and other miscellaneous
validation. On intel it's a particularly a large win (50-100% in my app)
because even though we let the indices stay in both CPU and GPU caches, we
still end up waiting for the GPU to be done with the buffer before reading
from it.
Drivers that want the min/max_index fields must now check index_bounds_valid
and use vbo_get_minmax_index before using them.
|
| |
| |
| |
| | |
This state flag has been unused since the ffvertex_prog move to core.
|
|/ |
|
|
|
|
|
| |
This can improve debugging with INTEL_DEBUG=batch,sync by giving smaller
batchbuffers.
|
|
|
|
|
| |
I keep wanting to hack this knob in as a one-time thing, so it seemed useful
to have all the time.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The i965 hardware cannot do GL_CLAMP behavior on textures; an earlier
commit forced a software fallback if strict conformance was required
(i.e. the INTEL_STRICT_CONFORMANCE environment variable was set) and
2D textures were used, but it was somewhat flawed - it could trigger
the software fallback even if 2D textures weren't enabled, as long
as one texture unit was enabled.
This fixes that, and adds software fallback for GL_CLAMP behavior with
1D and 3D textures.
It also adds support for a particular setting of the INTEL_STRICT_CONFORMANCE
environment variable, which forces software fallbacks to be taken *all*
the time. This is helpful with debugging. The value is:
export INTEL_STRICT_CONFORMANCE=2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
i965 doesn't natively support GL_CLAMP; it treats it like
GL_CLAMP_TO_EDGE, which fails conformance tests.
This fix adds a clause to the check_fallbacks() test to check
whether GL_CLAMP is in use on any enabled 2D texture. If so,
and if strict conformance is required (via INTEL_STRICT_CONFORMANCE),
a software fallback is mandated.
In addition, validate textures *before* checking for fallbacks,
rather than after; otherwise, the texture state is never validated
and can't be trusted. (In particular, if texturing is enabled and
the sampler would access any level beyond level 0 of a texture, the
sampler will segfault, because texture validation sets the firstLevel
and lastLevel fields of a texture object so that the valid levels
will be mapped and accessed correctly. If texture validation doesn't
occur, only level 0 is accessed correctly, and that only because
firstLevel and lastLevel happen to be set to 0.)
|
|
|
|
|
|
|
|
|
|
| |
When doing line stipple, the stipple count resets on each line segment,
unless the primitive is a GL_LINE_LOOP or a GL_LINE_STRIP.
The existing code correctly identifies the need for a software fallback
to handle conformant line stipple on GL_LINE_LOOP primitives, but
neglects to make the same assessment on GL_LINE_STRIP primitives.
This fixes it so they match.
|
| |
|
| |
|
|
|
|
|
|
| |
This got flipped around in 7855b2aef6bd9e9c2d73260b5cd166159b2525c6.
Bug #18907. Thanks to idr for pointing me at a nicer testcase than blender.
|
|
|
|
|
|
|
| |
Later primitives, even if they caused a full state validate, wouldn't check
that there was enough space in the batchbuffer, occasionally triggering the
sanity check. We also skipped the aperture space check, even if it would
mean bringing in new programs and associated state.
|
|
|
|
|
| |
This was a regression in 59b2c2adbbece27ccf54e58b598ea29cb3a5aa85 that broke
blender, among other apps.
|
|
|
|
|
|
|
|
| |
Previously, since my check_aperture API change, we would check each piece of
state against the batchbuffer individually, but not all the state against the
batchbuffer at once. In addition to not being terribly useful in assuring
success, it probably also increased CPU load by calling check_aperture many
times per primitive.
|
|
|
|
|
|
|
| |
This avoids issues with dereferencing stale cliprects around intel_draw_buffer
time. Additionally, take advantage of cliprects staying constant for FBOs and
DRI2, and emit cliprects in the batchbuffer instead of having to flush batch
each time they change.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
This isn't required for GEM (at least, yet), but the check_aperture code
for non-GEM results in batch getting flushed during emit. brw_state_upload
restarts state emits, but a bunch of the state emit functions were assuming
that they would be called exactly once, after prepare and before new_batch.
Bug #17179.
|
|
|
|
| |
Makefile.template
|
| |
|
|
|
|
| |
This reverts commit 7c81124d7c4a4d1da9f48cbf7e82ab1a3a970a7a.
|
|
|
|
|
|
|
|
| |
This reverts commit 53675e5c05c0598b7ea206d5c27dbcae786a2c03.
Conflicts:
src/mesa/drivers/dri/i965/brw_wm_surface_state.c
|