| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
And now that they're static tables, we don't need to ralloc a copy in
non-shared memory.
Saves ~210k in the built intel drivers.
Bug: https://bugs.chromium.org/p/chromium/issues/detail?id=1048434
Reviewed-by: Lionel Landwerlin <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5829>
|
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5829>
|
|
|
|
|
|
|
| |
Signed-off-by: Marcin Ślusarz <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Mark Janes <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5399>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
perf_cfg is enough - it already contains almost all necessary
information and is constructed in a more optimal way (O(n) vs O(n^2)
- it uses hash table to build the unique counter list).
"Almost all", because it doesn't contain OA raw counters, but
we should have not exposed them anyway. Quoting Mark Janes:
"I see no reason to include the OA raw counters in the list that
are provided to the user. They are unusable.
The MDAPI library can be used to configure raw counters in a way
that provides esoteric metrics, but that library is written against
INTEL_performance_query."
Signed-off-by: Marcin Ślusarz <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Mark Janes <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5399>
|
|
|
|
|
|
|
|
| |
$ vkcube
INTEL-MESA: warning: Performance support disabled, consider sysctl dev.i915.perf_stream_paranoid=0
Reviewed-by: Lionel Landwerlin <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5461>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We initially used this debug option to mean "don't bother registering
the OA configuration into the kernel".
This change makes this option suppress any interaction with the
i915/perf interface. This is useful when debugging self modifying
batches with performance queries while running on the intel_mi_runner.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2775>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current code relies on the order of the function
gen_perf_query_result_accumulate() to match the descriptions written
by gen_perf.py. Let's just reuse the offset specified in the python
script.
v2: Use accumlator offsets more (Jason)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2775>
|
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2775>
|
|
|
|
|
|
|
|
| |
The produced array tells use what metric to enable for a given pass.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2775>
|
|
|
|
|
|
|
|
|
| |
We want to compute the number of passes required to gather performance
data about a set of counters.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2775>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For a future extension we want to be able to list the counters. Our
existing sets counters might contain the same counters multiple times.
This is a side effect of the fixed OA counters in the HW. We track
thoses with a mask so that we know when a counter is available from
multiple metrics.
v2: Use BITFIELD64_BIT() (Jason)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2775>
|
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2775>
|
|
|
|
|
|
|
|
|
| |
On Vulkan most of those are already covered by standard queries so
add the ability to skip them.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2775>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because of functional requirements for Gen11, when perf is enabled we
only power half the EU array.
This change forces it to enable everything.
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Tapani Pälli <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
Reviewed-by: Mark Janes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4021>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the powergating configuration of the EU array. The default is
everything powered.
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Tapani Pälli <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
Reviewed-by: Mark Janes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4021>
|
|
|
|
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Tapani Pälli <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
Reviewed-by: Mark Janes <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4344>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4344>
|
|
|
|
|
|
|
|
|
|
|
| |
Where they belong.
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
Acked-by: Tapani Pälli <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
Reviewed-by: Mark Janes <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4344>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This stuff is somewhat specific to the GL extension & drivers. On
Vulkan we won't use this, it also made a rather large file.
v2: Fix Android build (Lionel)
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
Acked-by: Tapani Pälli <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
Reviewed-by: Mark Janes <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4344>
|
|
|
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Tapani Pälli <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
Reviewed-by: Mark Janes <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4344>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A few hash_table users roll their own integer hash functions which
call _mesa_hash_data to perform the hashing which ultimately calls
into XXH32 with a dynamic key length. When using small keys with a
constant size the hash rate can be greatly improved by inlining
XXH32 and providing it a constant key length, see:
https://fastcompression.blogspot.com/2018/03/xxhash-for-small-keys-impressive-power.html
Additionally, this patch removes calls to _mesa_key_hash_string and
makes them instead call _mesa_has_string directly, matching the new
integer hash functions.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3475>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Signed-off-by: Alan Coopersmith <[email protected]>
[Eric: factor out the is_dir_or_link() check and fix a bug in v1]
Signed-off-by: Eric Engestrom <[email protected]>
v3: include directory path when lstat'ing files
v4: fix inverted check in enumerate_sysfs_metrics()
Reviewed-by: Eric Engestrom <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2258>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2258>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also forgotten in the initial implementation.
v2: Report begin timestamp scaled by the timestamp frequency (Windows
behavior)
v3: Rename split to disjoint to match GL terminology (Tapani)
Signed-off-by: Lionel Landwerlin <[email protected]>
Cc: <[email protected]>
Acked-by: Tapani Pälli <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3112>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3112>
|
|
|
|
|
|
|
|
|
|
|
| |
This was forgotten in the initial implementation.
v2: ensure the value is written for both GL & Vulkan queries
Signed-off-by: Lionel Landwerlin <[email protected]>
Cc: <[email protected]>
Acked-by: Tapani Pälli <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3112>
|
|
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Mark Janes <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3113>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3113>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Our current implementation of performance queries is fairly harsh
because it completely flushes and invalidates the 3d pipeline caches
at the beginning and end of each query. An argument can be made that
this is how performance should be measured but it probably doesn't
reflect what the application is actually doing and the actual cost of
draw calls.
A more appropriate approach is to just stall the pipeline at
scoreboard, so that we measure the effect of a draw call without
having the pipeline in a completely pristine state for every draw
call.
v2: Use end of pipe PIPE_CONTROL instruction for Iris (Ken)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was initially intended to fix issues with the query timings going
occassionally high.
It turns out there was a bug in the attribution of OA reports to our
context when parsing the OA data. This led to reports flagged with
other context IDs to be included in our queries results.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This expression was unused by the macro, probably why it didn't
register in the compilation.
Signed-off-by: Lionel Landwerlin <[email protected]>
Cc: <[email protected]>
Reviewed-by: Mark Janes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a more accurate description of what happens in processing the
OA reports.
Previously we only had a somewhat difficult to parse state machine
tracking the context ID.
What we really only need to do to decide if the delta between 2
reports (r0 & r1) should be accumulated in the query result is :
* whether the r0 is tagged with the context ID relevant to us
* if r0 is not tagged with our context ID and r1 is: does r0 have a
invalid context id? If not then we're in a case where i915 has
resubmitted the same context for execution through the execlist
submission port
v2: Update comment (Ken)
Signed-off-by: Lionel Landwerlin <[email protected]>
Cc: <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we read the OA reports late enough after the query happens, we can
get a timestamp in the report that is significantly in the past
compared to the start timestamp of the query. The current code must
deal with the wraparound of the timestamp value (every ~6 minute). So
consider that if the difference is greater than half that wraparound
period, we're probably dealing with an old report and make the caller
aware it should read more reports when they're available.
Signed-off-by: Lionel Landwerlin <[email protected]>
Cc: <[email protected]>
Reviewed-by: Mark Janes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We always add an empty buffer in the list when creating the query.
Let's set the len appropriately so that we can recognize it when we
read OA reports up to the end of a query.
We were using an 0 timestamp value associated with the empty buffer
and incorrectly assuming this was a valid value. In turn that led to
not reading enough reports and resulted in deltas added to our counter
values which should have been discarded because those would be flagged
for a different context.
Signed-off-by: Lionel Landwerlin <[email protected]>
Cc: <[email protected]>
Reviewed-by: Mark Janes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Accumulation happens between 2 reports, it can be between a start/end
report from another context. So only consider updating the hw_id of
the results when it's not already valid and that we have a valid value
to put in there.
Signed-off-by: Lionel Landwerlin <[email protected]>
Fixes: 41b54b5faf ("i965: move OA accumulation code to intel/perf")
Reviewed-by: Mark Janes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
We use this as a communication mechanism between MDAPI & Anv.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
| |
Will conflict with the genxml RPSTAT register.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
| |
We want to query the content of register configurations from the
kernel. Let's pull this out of the query.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The Vulkan performance query extension is a bit lower level than the
GL one. Expose some of the functions to do the result accumulation
directly in the Anv driver.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
| |
Misspelling was seen with INTEL_DEBUG=perfmon.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Encapsulate the details of this structure within the perf implemenation.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Encapsulate the details of this data structure.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
INTEL_DEBUG=perfmon will iterate over the perf queries, printing
information about the state of each query. Some of this information
will be private to intel/perf, and needs to a dump routine that can be
called from i965.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Now that all references from i965 have been moved to perf, we can make
internal methods private again.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
All references to this data structure have been moved inside the perf
subsystem.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
By encapsulating this implementation within perf, we can eventually
make struct gen_perf_ctx private.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This refactor moves several helper functions for get_query_data as
well:
- accumulate_oa_reports
- read_gt_frequency
- get_pipeline_stats_data
- get_oa_counter_data
Functions which are no longer referenced in brw_performance_query.c
have been removed.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following methods have duplicate implementation of read_oa_samples_until in
brw_performance_query.c:
- read_oa_samples_for_query
- read_oa_samples_until
They ar still referenced by other methods in the file and will be
removed on the subsequent commit.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|