| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a bare-bones backend for the INTEL_performance_query extension
that exposes pipeline statistics.
Although this could be considered redundant given that the same
statistics are already available via query objects, they are a simple
starting point for this extension and it's expected to be convenient for
tools wanting to have a single go to api to introspect what performance
counters are available, along with names, descriptions and semantic/data
types.
This code is derived from Kenneth Graunke's work, temporarily removed
while the frontend and backend interface were reworked.
Signed-off-by: Robert Bragg <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of using the same backend interface as AMD_performance_monitor
this defines a dedicated INTEL_performance_query interface that is
modelled more on the ARB_query_buffer_object interface (considering the
similarity of the extensions) with the addition of vfuncs for
initializing and enumerating query and counter info.
Compared to the previous backend, some notable differences are:
- The backend is free to represent counters using whatever data
structures are optimal/convenient since queries and counters are
enumerated via an iterator api instead of declaring them using
structures directly shared with the frontend.
This is also done to help us support the full range of data and
semantic types available with INTEL_performance_query which is awkward
while using a structure shared with the AMD_performance_monitor
backend since neither extension's types are a subset of the other.
- The backend must support waiting for a query instead of the frontend
simply using glFinish().
- Objects go through 'Active' and 'Ready' states consistent with the
query object backend (hopefully making them more familiar). There is
no 'Ended' state (which used to show that a query has ended at least
once for a given object). There is a new 'Used' state, set when a
query is first begun which implies that we are expecting to get
results back for the object at some point. There's no equivalent to
the 'EverBound' state since the spec doesn't require there to be a
limbo state between generating IDs and associating them with an object
on query Begin.
The INTEL_performance_query and AMD_performance_monitor extensions are
now completely orthogonal within Mesa main (though a driver could
optionally choose to implement both extensions within a unified backend
if that were convenient for the sake of sharing state/code).
v2: (Samuel Pitoiset)
- init PerfQuery.NumQueries in frontend
- s/return_string/output_clipped_string/
- s/backed/backend/ typo
- remove redundant *bytesWritten = 0
v3:
- Add InitPerfQueryInfo for lazy probing of available queries
v4:
- Clean up some internal usage of GL typedefs (Ken)
Signed-off-by: Robert Bragg <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To allow the backend interfaces for AMD_performance_monitor and
INTEL_performance_query to evolve independently based on the more
specific requirements of each extension this starts by separating
the frontends of these extensions.
Even though there wasn't much tying these frontends together, this
separation intentionally copies what few helpers/utilities that were
shared between the two extensions, avoiding any re-factoring specific to
INTEL_performance_query so that the evolution will be easier to follow
later.
Signed-off-by: Robert Bragg <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It looks like it was partly copied from the median filter fragment shader
and unnecessesarily saved a lot of temporary values.
Signed-off-by: Thomas Hellstrom <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The vdpau state tracker allows multiple threads access to the same gallium
context simultaneously. We can fix this either by locking the same mutex
each time the context is used or by using a different gallium context for
each mutex domain. Here we do the latter, although I'm not sure that's really
the best option.
Signed-off-by: Thomas Hellstrom <[email protected]>
Acked-by: Sinclair Yeh <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
| |
Makes the code significantly more readable.
Signed-off-by: Thomas Hellstrom <[email protected]>
Reviewed-by: Sinclair Yeh <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When looking at the full range matrices, it becomes obvious that the difference
between the standard matrices and the full range matrices is that the full
range matrices are multiplied by 1.164. Together with offsetting the y value
with -16/255, this will scale and offset RGB with the desired quantities.
However, the standard SMPTE 240M matrix seems to differ a bit since the
U and V coefficients are only multiplied with 1.138 to get the full range
matrix. This would actually alter the color somewhat so I figure that's an
error. The full range matrix is consistent with Nvidia's VDPAU implementation.
We can also incorporate the ybias in the brightness simplifying the
calculation somewhat.
Signed-off-by: Thomas Hellstrom <[email protected]>
Reviewed-by: Sinclair Yeh <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The brightness matrix doesn't actually match the procamp matrix and
what's calculated in vl_csc_get_matrix.
Signed-off-by: Thomas Hellstrom <[email protected]>
Reviewed-by: Sinclair Yeh <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It will cause multiple simultaneous maps of the same vertex buffer and
flushed-while-mapped warnings.
Signed-off-by: Thomas Hellstrom <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
| |
Needed for at least the svga driver.
Signed-off-by: Thomas Hellstrom <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
| |
The svga driver relies on the existence of these sampler views.
Signed-off-by: Thomas Hellstrom <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
| |
Windows doesn't have dlfcn.h. Protect the code in question
with #if ENABLE_SHADER_CACHE test. And fix indentation.
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
Fixes: dbb0eaccc radv: handle subpass cache flushes
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This extension adds new query types which can be used to detect overflow
of transform feedback buffers. The new query types are also accepted by
conditional rendering commands.
v3:
- s/gen7+/gen6+/ in the relnotes (Jordan Justen)
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enable the use of a transform feedback overflow query with
glBeginConditionalRender. The render commands will only execute if the
query is true (i.e. if there was an overflow).
Use ARB_conditional_render_inverted to change this behavior.
v4:
- reuse MI_MATH calcs from hsw_queryob (Kenneth)
- fallback to software conditional rendering when MI_MATH is not
available (Kenneth)
v5:
- check query->Target (Kenneth)
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enable getting the results of a transform feedback overflow query with a
buffer object.
v4:
- hsw_overflow_result_to_gpr0 a public function, so it can be used
by conditional render. (Kenneth)
- fix typo grp0/gpr0 (Kenneth)
- rename load_gen_written_data_to_regs to
load_overflow_data_to_cs_gprs (Kenneth)
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When querying for transform feedback overflow on one or all of the
streams, store information about number of generated and written
primitives. Then check whether generated == written.
v2:
- use only SO_PRIM_STORAGE_NEEDED, do not fallback to
CL_INVOCATION_COUNT. (Kenneth)
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Also update checks on conditional rendering.
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Add some basic types and storage for the queries of this extension.
v2:
- update date of extension (Kenneth)
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
WARNING: sphinx.ext.pngmath has been deprecated. Please use
sphinx.ext.imgmath instead.
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
src/gallium/docs/source/tgsi.rst:3488: WARNING: Title underline too short.
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
Without these, mathjax considers these as the continuation of the
previous line.
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
src/gallium/docs/source/context.rst:95: ERROR: Unexpected indentation.
Sub lists need to be surrounded by a blank line.
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
is used
The make check test is also updated to make sure these dirs are created.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The only feature over and above ES 3.0 is DrawTransformFeedback().
We already have to do the whole SOL_NUM_PRIMS_WRITTEN counter dance in
order to compute the SVBI value for ResumeTransformFeedback(), at which
point our existing GetTransformFeedbackVertexCount() implementation will
do the trick (though with a stall to CPU map the buffer).
Someday, we could probably implement DrawTransformFeedback() more
efficiently, using the "Load Internal Vertex Count" feature of
3DSTATE_SVB_INDEX and the 3DPRIMITIVE indirect vertex count bit.
Rumor has it this allows people to use WebGL 2.0 on Sandybridge.
Note that we don't need pipelined register writes like Gen7+ because
we use the 3DSTATE_SVB_INDEX command rather than MI_LOAD_REGISTER_MEM.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99842
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes Piglit's ARB_transform_feedback2/change-objects-while-paused
GLES 3.0 test. When resuming the transform feedback object, we need to
reset the SVBI counters so we continue writing at the correct point in
the buffer.
Instead of SO_WRITE_OFFSET counters (with a DWord offset), we have the
Streamed Vertex Buffer Index (SVBI) counters, which contain a count of
vertices emitted.
Unfortunately, there's no straightforward way to store the current SVBI
counter values to a buffer. They're not available in a register. You
can use a bit in the 3DSTATE_SVB_INDEX packet to copy them to another
internal counter which 3DPRIMITIVE can use...but there's no good way to
extract that either.
So, once again, we use SO_NUM_PRIMS_WRITTEN to calculate the vertex
numbers. Thankfully, we can reuse most of the existing Gen7+ code.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
| |
I'm going to need this in a new Resume hook shortly.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
| |
Sandybridge and earlier only have a single counter.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
| |
This way on Sandybridge we'll only do 1 stream worth of math, since
we only have one SO_NUM_PRIMS_WRITTEN counter.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
| |
I plan to use these functions on Sandybridge soon. I changed the prefix
on a couple of functions to "brw" instead of "gen7" as in theory they
should be usable all the way back to G45.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
| |
These driver hooks are not used when MI_MATH and MI_LOAD_REGISTER_REG
are supported, which Gen8+ can always do. So this code is dead.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
R600_DEBUG=mono has had no effect since:
commit 1fabb297177069e95ec1bb7053acb32f8ec3e092
Author: Marek Olšák <[email protected]>
Date: Tue Feb 14 22:08:32 2017 +0100
radeonsi: have separate LS and ES main shader parts in the shader selector
Also, this assertion was failing:
si_state_shaders.c:1307: si_shader_select_with_key: Assertion
`!shader->is_optimized' failed.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Recommended by Matt Arsenault.
46757 shaders in 28742 tests
Totals:
SGPRS: 2068851 -> 2066907 (-0.09 %)
VGPRS: 1604056 -> 1602676 (-0.09 %)
Spilled SGPRs: 1402 -> 1382 (-1.43 %)
Spilled VGPRs: 113 -> 113 (0.00 %)
Private memory VGPRs: 1332 -> 1332 (0.00 %)
Scratch size: 3224 -> 3188 (-1.12 %) dwords per thread
Code Size: 58815520 -> 58716788 (-0.17 %) bytes
LDS: 1162 -> 1162 (0.00 %) blocks
Max Waves: 354616 -> 354905 (0.08 %)
Wait states: 0 -> 0 (0.00 %)
Totals from affected shaders:
SGPRS: 786452 -> 784508 (-0.25 %)
VGPRS: 530000 -> 528620 (-0.26 %)
Spilled SGPRs: 958 -> 938 (-2.09 %)
Spilled VGPRs: 85 -> 85 (0.00 %)
Private memory VGPRs: 636 -> 636 (0.00 %)
Scratch size: 1880 -> 1844 (-1.91 %) dwords per thread
Code Size: 26349936 -> 26251204 (-0.37 %) bytes
LDS: 304 -> 304 (0.00 %) blocks
Max Waves: 108962 -> 109251 (0.27 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
v2: define lp_float_mode
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We were unconditionally storing these outputs, sometimes even one component
at a time, but apps never read them in TES.
Move the TESSINNER/OUTER buffer stores into the TCS epilog where we can
easily disable them on demand.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This removes a lot of useless LDS stores.
A few games read TESSINNER/OUTER, but not any other outputs. Most games
don't read any outputs.
The only app doing LDS output reads is UE4 Lightsroom Interior.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
not all of them will be used immediately
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
This allows the helper to check for llc instead of having to do it
manually at all the call sites.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
| |
All this cache line address calculation stuff is tricky. Let's not
duplicate it more places than we have to.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
It's a bit shorter and easier to work with. Also, we're about to add a
helper called clflush which does the clflush but without any memory
fencing.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
Found by inspection. However, I expect it fixes real bugs when using
blorp from Vulkan on little-core platforms.
Reviewed-by: Lionel Landwerlin <[email protected]>
Cc: "13.0 17.0" <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
| |
This fixes a some rendering corruption in The Talos Principle
Reviewed-by: Lionel Landwerlin <[email protected]>
Cc: "13.0 17.0" <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
Cc: "13.0 17.0" <[email protected]>
|
|
|
|
|
|
|
| |
Found by inspection
Reviewed-by: Lionel Landwerlin <[email protected]>
Cc: "13.0 17.0" <[email protected]>
|