| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
Instead of "genX_bits.h" use "genxml/genX_bits.h"
as already done in other similar cases
Besides being more correct, it also fixes building error in Android.
Fixes: f0d2923 ("i965/gen11: Emit SLICE_HASH_TABLE when pipes are unbalanced.")
Signed-off-by: Mauro Rossi <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
| |
If the pixel pipes have a different number of subslices, emit a slice
hashing table that will ensure proper workload distribution.
v2: Set Mask field to 0xffff for workaround (Ken).
|
|
|
|
|
|
|
|
|
| |
This commit is all annoying plumbing work which just adds support for a
new brw_compile_stats struct. This struct provides a binary driver
readable form of the same statistics we dump out to stderr when we
INTEL_DEBUG is set with a shader stage.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The default pixel hashing mode settings used for slice and subslice
load balancing are far from optimal under certain conditions (see the
comments below for the gory details). The top-of-the-line GT4 parts
suffer from a particularly severe performance problem currently due to
a subslice load balancing issue. Fixing this seems to improve
graphics performance across the board for most of the benchmarks in my
test set, up to ~20% in some cases, e.g. from SKL GT4:
unigine/valley: 3.44% ±0.11%
gfxbench/gl_manhattan31: 3.99% ±0.13%
gputest/pixmark_piano: 7.95% ±0.33%
synmark/OglTexFilterAniso: 15.22% ±0.07%
synmark/OglTexMem128: 22.26% ±0.06%
Lower-end platforms are also affected by some subslice load imbalance
to a lesser degree, especially during CCS resolve and fast clear
operations, which are handled specially here due to rasterization
ocurring in reduced CCS coordinates, which changes the semantics of
the pixel hashing mode settings.
No regressions seen during my tests on some SKL, KBL and BXT
configurations. Additional benchmark reports welcome on any Gen9
platforms (that includes anything with Skylake, Broxton, Kabylake,
Geminilake, Coffeelake, Whiskey Lake, Comet Lake or Amber Lake in your
renderer string).
P.S.: A similar problem is likely to be present on other non-Gen9
platforms, especially for CCS resolve and fast clear operations.
Will follow-up with additional patches fixing the hashing mode
for those once I have enough performance data to justify it.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of asking spirv_to_nir to lower the workgroup (shared memory)
to offsets, keep them as derefs longer, then lower it later on.
Because Workgroup memory doesn't have explicit offsets, we need to set
those using nir_lower_vars_to_explicit_types before calling the I/O
lowering pass.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Danylo Piliaiev <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Fixes: 8ae6667992ccca41d088 ("intel/perf: move query_object into perf")
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Mark Janes <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Acked-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Acked-by: Eric Engestrom <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Eric Engestrom <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
| |
v5: add patch
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
This avoids a warning about implicitly casting away the constness of the
pointer.
Signed-off-by: Erik Faye-Lund <[email protected]>
Acked-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is an object-level preemption workaround which requires this.
However, even without object-level preemption, we seem to have issues
with geometry flickering when 3D and compute are combined in the same
batch and this appears to fix it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110395
Suggested-by: Jason Ekstrand <[email protected]>
Signed-off-by: Danylo Piliaiev <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: [email protected]
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Encapsulate the details of this data structure.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
INTEL_DEBUG=perfmon will iterate over the perf queries, printing
information about the state of each query. Some of this information
will be private to intel/perf, and needs to a dump routine that can be
called from i965.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
By encapsulating this implementation within perf, we can eventually
make struct gen_perf_ctx private.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This refactor moves several helper functions for get_query_data as
well:
- accumulate_oa_reports
- read_gt_frequency
- get_pipeline_stats_data
- get_oa_counter_data
Functions which are no longer referenced in brw_performance_query.c
have been removed.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following methods have duplicate implementation of read_oa_samples_until in
brw_performance_query.c:
- read_oa_samples_for_query
- read_oa_samples_until
They ar still referenced by other methods in the file and will be
removed on the subsequent commit.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Iris and i965 variants of this method need to be called by perf
routines.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Iris and i965 variants of this method need to be called by perf
routines.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Iris and i965 variants of this method need to be called by perf
routines.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
To move more operations into intel/perf, several state items are
needed. Save references to that state in the perf_ctxt, rather than
passing them in for every operation.
This commit includes an initializer for gen_perf_context, to set those
references and also encapsulate the initialization of the sample
buffer state.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
These operations are needed to refactor subsequent methods into perf
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
This method is needed to move subsequent methods into perf.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Most accesses to perf state were made through repeated dereferences of
brw_context members. Prefering temporary variables of perf_ctx and
perf_cfg has the following advantages:
- more concise implementation
- easier refactor when moving subsequent methods to perf
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Query objects can now be encapsulated within the perf subsystem.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
This method is needed to move subsequent methods into perf.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The "context" that is necessary to submit and process perf commands to
the hardware was previously present in the brw_context.perfquery
struct. This commit moves it into perf and provides a more
understandable name.
The intention is for this struct to be private, when all methods that
access it are migrated into perf.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
oa_sample_buf holds the data provided by the kernel that will be
collated into performance metrics. Since this functionality will be
implemented in perf, the struct needs to be defined there.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Iris and i965 both need to enumerate the available metrics, so these
routines must be located in perf.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
The perf subsystem needs several macro definitions that were
duplicated in Iris and i965 headers. Place these macros within perf,
if the perf implementation contains the only references to the values.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
In preparation for calling both Iris and i965 implementions from perf.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
In preparation for calling both Iris and i965 implementions from perf.
Reviewed-by: Kenneth Graunke <[email protected]>
|