| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts most of commit 0f2da773070c06b6d20ad264d3abb19c4dfd9761.
(I chose to leave the additions to brw_defines.h.)
My previous Ironlake implementation was somewhat broken: counter data
was global, rather than per-context. This meant that performance
monitors captured data from your compositor, 2D driver, and other 3D
programs.
Originally, I believed that Sandybridge and later had an easy way to
avoid this problem (setting per-context flags in OACONTROL), while
Ironlake did not. So I'd intended to leave it as a known limitation of
performance monitoring support on Ironlake. However, this turned out
not to be true.
Unfortunately, our hardware only has one set of aggregating performance
counters shared between all 3D programs, and their values are not saved
or restored by hardware contexts. Also, at least on Sandybridge and
Ivybridge, the counters lose their values if the GPU goes to sleep.
To work around both of these problems, we have to snapshot the
performance counters at the beginning and end of each batch, similar to
how we handle query objects on platforms that don't support hardware
contexts.
For occlusion queries, this batch bookending approach is fairly simple:
only one occlusion query can be active at a time, and the result is a
single integer. Performance monitors are more complex: an arbitrary
number of monitors can be active at a time, each monitoring some subset
of our ~30 observability counters. Individual monitors can be started
and stopped at any point during the batch. Tracking where each monitor
started/ended relative to batch flushes ends up being a pain. And you
can run out of space in the buffer.
Properly supporting this required some serious rearchitecting of the
code. Rather than writing patches to try and morph a broken system into
a working one (which operates quite differently), I decided it would be
simplest to revert the old code and start fresh. Parts will look
familiar, but other parts are new.
I also decided it would be best to include Sandybridge and Ivybridge
support from the start, since the newer platforms have added complexity
that I wanted to make sure worked. They're also what most people care
about these days.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
| |
Return false, not GL_FALSE. Add missing return value.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71359
|
|
|
|
|
|
|
|
|
|
| |
Improves performance of RoboHornet's 2D Canvas toDataURL benchmark
[http://www.robohornet.org/#e=canvastodataurl] by approximately 5x
on Baytrail on ChromiumOS.
Elapsed time drops by -81.4861% +/- 1.22619% (n=3 s=14.9105, confidence=95%).
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
| |
Uses SSE 4.1's MOVNTDQA instruction (streaming load) to read from
uncached memory without polluting the cache.
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This patch make changes to correctly set up the Dispatch GRF Start
Register in case of 'SIMD16 only' FS dispatch.
This fixes an issue of incorrect rendering on dolphin emulator with
GL_SAMPLE_SHADING enabled.
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
This theoretically works on earlier hardware as well, but the extension
requires at least GL3.0.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
V2: fix interaction with VertexAttribFormat, since that landed after
this was originally written
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This brings over the batch-wrap-prevention and aperture space checking
code from the normal brw_draw.c path, so that we don't need to flush the
batch every time.
There's a risk here if the intel_emit_post_sync_nonzero_flush() call isn't
high enough up in the state emit sequences -- before, we implicitly had
one at the batch flush before any state was emitted, so Mesa's workaround
emits didn't really matter. Since the SNB fixes by Ken, I didn't see any
regressions after 3 piglit runs.
Improves cairo-gl performance by 13.7733% +/- 1.74876% (n=30/32)
Improves minecraft apitrace performance by 1.03183% +/- 0.482297% (n=90).
Reduces low-resolution GLB 2.7 performance by 1.17553% +/- 0.432263% (n=88)
Reduces Lightsmark performance by 3.70246% +/- 0.322432% (n=126)
No statistically significant performance difference on unigine tropics
(n=10)
No statistically significant performance difference on openarena (n=755)
The two apps that are hurt happen to include stalls on busy buffer
objects, so I think this is an effect of missing out on an opportune
flush.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
I want a conditional that says generally "we have x86 assembly" in the
next patch.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
Check if the new buffer object has the same name as the current
buffer object before looking it up.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
update_array() and update_array_format() are changed to update the new
attrib and binding states, and the client arrays become derived state.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
...and rename it to _mesa_bind_buffer_gen().
This is so the function can be called from _mesa_BindVertexBuffer().
This patch also adds a caller parameter so we can report the right
entry point in error messages.
Based on a patch by Eric Anholt.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
| |
This will become derived state as part of the ARB_vertex_attrib_binding
support.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Split out the code for updating the array format into a new function
called update_array_format(). This function will be called by both
update_array() and the new glVertexAttrib*Format() entry points in
ARB_vertex_attrib_binding.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
| |
This will be used by the ARB_vertex_attrib_binding implementation.
This reverts commit db38e9a0e179441f59274f6f2a751912c29872e2.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, has_surface_tile_offset is equivalent to gen == 4 && !is_g4x.
We already use it for related checks in brw_wm_surface_state.c, so it
makes sense to use it here too. It's simpler and more future-proof.
Broadwell also lacks surface tile offsets. With this patch, I won't
need to update any generation checking; I can simply not set the flag.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
Patch from Ubuntu package
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Andreas Boll <[email protected]>
|
|
|
|
|
|
|
| |
Hardware docs say we can only use SIMD8 dispatch in this condition.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
| |
|
| |
|
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
| |
Since it's helpful to know why the shader did not compile.
Also, call fflush() for Windows.
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
|
|
| |
V2: Add comment explaining what emit_alpha_test() is for;
fix spurious temp and bogus whitespace.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
The same setup is required here as when the user-provided shader
explicitly uses KIL or discard.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
V2: Better explanation of the rationale for doing this.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
We have to do this in the shader instead, since these gens lack an
independent RT0 alpha value in their render target write messages.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
Now that brw_update_texture_buffer_surface() uses the virtual
emit_buffer_surface_state() function, it works for Gen7+ too.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Now that brw_create_constant_surface uses a virtual function internally,
it doesn't need to be virtual itself. We can delete the Gen7+ variant
and simplify things.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
| |
This will allow us to combine the Gen4-6 and Gen7 variants of these
functions.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This entails adding "mocs" and "rw" parameters to the Gen4-5 version.
I made it actually pay attention to the rw flag (even though it is
always false), but mocs is always ignored.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
| |
fix: intel_screen.c:1320:4: warning: initialization from
incompatible pointer type [enabled by default]
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Before the series with 3c9dc2d31b80fc73bffa1f40a91443a53229c8e2 to
dynamically assign our binding table indices, we didn't really track our
binding table count per shader, so we never filled in these fields.
Affects cairo-gl trace runtime by -2.47953% +/- 1.07281% (n=20)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
You can't return stack-initialized values and expect anything good to
happen.
Reviewed-by: Chad Versace <[email protected]
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
This will enable removing the dd_function_table::Scissor hook in the
near future.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
| |
This will enable removing the dd_function_table::DepthRange hook in the
near future.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The x, y, width, and height parameters aren't used by radeon_viewport,
so don't pass them. This should make future changes to the
dd_function_table::Viewport interface a little easier.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Cc: Courtney Goeltzenleuchter <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The i830 and the i915 driver have the same dd_function_table::Viewport
function... it just has two names and lives in two places. Using a
single implementation allows cleaning up the saved_viewport nonsense
too.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Cc: Courtney Goeltzenleuchter <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The i965 driver never installed a dd_function_table::Viewport function,
so this wrapper never actually did anything.
No piglit regressions on IVB on DRI2.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Cc: Courtney Goeltzenleuchter <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|