| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I initially produced the patch using this bash command:
for file in {intel,i915,i965}/*.{c,cpp,h}; do [ ! -h $file ] && sed -i
's/GLboolean/bool/g' $file && sed -i 's/GL_TRUE/true/g' $file && sed -i
's/GL_FALSE/false/g' $file; done
Then I manually added #include <stdbool.h> to fix compilation errors,
and converted a few functions back to GLboolean that were used in core
Mesa's function pointer table to avoid "incompatible pointer" warnings.
Finally, I cleaned up some whitespace issues introduced by the change.
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Chad Versace <[email protected]>
Acked-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For example, GL_TRIANLGES is converted to _3DPRIM_TRILIST.
The conversion is necessary because HiZ and MSAA resolve operations emit
a 3DPRIM_RECTLIST, which cannot be conveyed by GLenum.
As a consequence, brw_gs_prog_key.primitive is also converted.
v2
----
- [anholt] Split brw_set_prim into brw/gen6 variants in previous commit,
since not much code is really shared between the two.
- [anholt] Replace switch statements with table lookups, since this is
a hot path.
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The "slight optimization to avoid the GS program" in brw_set_prim() is not
used by Gen 6, since Gen 6 doesn't use a GS program. Also, Gen 6 doesn't use
reduced primitives.
Also, document that intel_context.reduced_primitive is only used for Gen < 6
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Chad Versace <[email protected]>
|
|
|
|
|
|
| |
This was sucking up 1% of the CPU on 3DMMES.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
| |
We're about to call this function in a bunch of state emits, so let's
not spam the hardware with flushes too hard.
|
|
|
|
|
| |
We actually could do this in hardware in the fragment shader using
gl_PointCoord and the point's size.
|
|
|
|
|
|
| |
We implement line stipples, just not *quite* correctly. We have a
piglit testcase to use when we want to fix it, if we do. Until then,
don't lie to our test suites.
|
|
|
|
|
|
|
|
|
|
| |
We do have hardware antialised lines. If we care, we should actually
fix them to be conformant (or as close as possible) instead of using
this knob to fool testcases using swrast.
For some interesting reading on the state of GL_*_SMOOTH across
several drivers, see:
http://homepage.mac.com/arekkusu/bugs/invariance/HWAA.html
|
|
|
|
|
|
| |
From my reading of the GL 2.1 spec, no antialiasing is strictly
conformant for polygon smoothing. Yes, it's absurd, but then,
hardware doesn't support this so maybe it's not so absurd.
|
|
|
|
|
| |
This was just a duplicate of no_rast=true driconf option, which is
relatively standard across drivers.
|
|
|
|
|
|
|
|
| |
This removes the stupid strict-conformance fallback code I broke when
adding ARB_sampler_objects.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36572
Reviewed-by: Kenneth Graunke <[email protected]> (v1)
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Overall, across this series since the last set of numbers, gen6 3DMMES
taiji performance has dropped 0.8% +/- 0.3% (n=15), probably due to
the increased reissuing of state from some of the state objects that
otherwise never changed, and increased occurrence of the per-batch
overhead as we've increased how much we put in the batch BO without
increasing the batch BO's size.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Improves 3DMMES taiji demo performance by 10.1% +/- 0.9% (n=15).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Improves 3DMMES taiji demo performance by 5.1% +/- 1.9% (n=15), by
reducing CPU time spent thrashing around those tiny little constant BOs.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This extension support consists of replacing
"gl_texture_obj->Sampler." with "_mesa_get_samplerobj(ctx, unit)->".
One instance of referencing the texture's base sampler remains in the
initial miptree allocation, where I'm not sure we have a clear
association with any texture unit.
Tested with piglit ARB_sampler_objects/sampler-objects.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
| |
Since we lack hardware support for it, this is a simple matter of
checking _mesa_check_conditional_render at the entrypoints, and
suppressing it for the metaops where it doesn't apply.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Most of the newer portions of the code use OUT_BATCH style. I prefer
this style because it offers a clear distinction between a) hardware
messages/structures with a mandatory format, and b) data structures for
our own internal use that we can format however we want.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
gl_texture_object contains an instance of this type for the regular
texture object sampling state. glGenSamplers() generates new instances
of gl_sampler_object which can override that state with glBindSampler().
|
|
|
|
|
|
|
|
|
| |
If the next vertex arrays are a (discontiguous) continuation of the
current arrays, such that the new vertices are simply offset from the
start of the current vertex buffer definitions we can reuse those
defintions and avoid the overhead of relocations and invalidations.
Signed-off-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
| |
Track reuse of the vertex buffer objects and so minimise the number of
vertex buffers used by the hardware (and their relocations).
Signed-off-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
It's faster. Not only is the memcpy more efficiently performed in the
kernel (making up for the system call overhead), but by not using mmap
we remove the greater overhead of tracking the vma of every batch.
And it means we can read back from the batch buffer without incurring
the cost of a uncached read through the GTT.
Signed-off-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
| |
Reuse the new common upload buffer for uploading temporary indices and
rebuilt vertex arrays.
Signed-off-by: Chris Wilson <[email protected]>
|
| |
|
|
|
|
|
|
|
| |
This provides the optimizer with hints about code hotness, which we're
quite certain about for debug printouts (or, rather, while we
developers often hit the checks for debug printouts, we don't care
about performance while doing so).
|
|
|
|
| |
This reverts commit 73dab75b4165f7d2214a68d4ba8e3cb7aab9b4ac.
|
| |
|
|
|
|
|
|
|
|
|
| |
There was a check to only do the rebase if we didn't have everything
in VBOs, but nexuiz apparently hands us a mix of VBOs and arrays,
resulting in blocking on the GPU to do a rebase.
Improves nexuiz 800x600, high-settings performance on my Ironlake 41%
(+/- 1.3%), from 14.0fps to 19.7fps.
|
|
|
|
| |
Until we fixed GS hang issue.
|
|
|
|
|
| |
It's more likely that we wrap badly in state setup than in the little
primitive packet.
|
|
|
|
|
| |
The slightly less mechanical change of converting the emit_reloc calls
will follow.
|
|
|
|
|
|
|
|
|
| |
This is similar to the GL_QUAD_STRIP -> TRIANGLE_STRIP optimization --
the GS usage to split the quads into tris is a huge bottleneck, so a
quick check improves glean blendFunc time massively (width * height of
the window of single-pixel GL_QUADS, many many times). This may also
end up helping with cairo performance, which sometimes ends up drawing
a single quad.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This uses a stamp mechanisms to mark the DRI drawable as invalid.
Instead of immediately updating the buffers we just bump the drawable
stamp and call out to DRI2GetBuffers "later".
"Later" used to be at LOCK_HARDWARE time, and this patch brings back
callouts at the points where we used to call LOCK_HARDWARE. A new function,
intel_prepare_render(), is called where we used to call LOCK_HARDWARE,
and if the buffers are invalid, we call out to DRI2GetBuffers there.
This lets us invalidate buffers only when notified instead of on
every glViewport() call. If the loader calls the DRI invalidate
entrypoint, we disable viewport triggered buffer invalidation.
Additionally, we can clean up the old viewport mechanism a bit,
since we can just invalidate the buffers and not worry about
reentrancy and whatnot.
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Conflicts:
src/mesa/drivers/dri/intel/intel_screen.c
src/mesa/drivers/dri/intel/intel_swapbuffers.c
src/mesa/drivers/dri/r300/r300_emit.c
src/mesa/drivers/dri/r300/r300_ioctl.c
src/mesa/drivers/dri/r300/r300_tex.c
src/mesa/drivers/dri/r300/r300_texstate.c
|
| | |
|
| | |
|
| | |
|
| | |
|
|/ |
|
|
|
|
|
|
| |
This should do all the things that MI_FLUSH did, but it can be pipelined
so that further rendering isn't blocked on the flush completion unless
necessary.
|
|\ |
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Conflicts:
Makefile
configs/default
progs/glsl/Makefile
src/gallium/auxiliary/util/u_simple_shaders.c
src/gallium/state_trackers/glx/xlib/xm_api.c
src/mesa/drivers/dri/i965/brw_draw_upload.c
src/mesa/drivers/dri/i965/brw_vs_emit.c
src/mesa/drivers/dri/intel/intel_context.h
src/mesa/drivers/dri/intel/intel_pixel.c
src/mesa/drivers/dri/intel/intel_pixel_read.c
src/mesa/main/texenvprogram.c
src/mesa/main/version.h
|
| | | |
|
| | |
| | |
| | |
| | |
| | | |
On the 965, we just drop the value into the primitive packet. On non-945,
we rely on the sw tnl code handling it.
|
|/ / |
|
| |
| |
| |
| | |
No performance difference proven at 95% confidence with my GLSL demo (n=10).
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This saves mapping the index buffer to get a bounds on the indices that
drivers just drop on the floor in the VBO case (cache win), saves a bonus
walk of the indices in the CheckArrayBounds case, and other miscellaneous
validation. On intel it's a particularly a large win (50-100% in my app)
because even though we let the indices stay in both CPU and GPU caches, we
still end up waiting for the GPU to be done with the buffer before reading
from it.
Drivers that want the min/max_index fields must now check index_bounds_valid
and use vbo_get_minmax_index before using them.
|
| |
| |
| |
| | |
This state flag has been unused since the ffvertex_prog move to core.
|
|/ |
|