| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
|
|
|
|
| |
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
|
|
|
|
|
|
| |
for threaded gallium
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
|
|
|
|
|
|
|
|
| |
There are even more counters in the CP_STAT register but I think
these ones are enough for now.
v2: only read (and expose) CP_STAT on VI+
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
|
|
|
|
| |
For simplicity, GPU-sdma-busy will return 0 on previous gens.
v2: only read SRBM_STATUS2 on Evergreen+
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
|
|
|
|
|
| |
This new query returns the current visible usage of VRAM accessed
by the CPU. It will return 0 on radeon because it's unimplemented.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Useful when debugging applications which map a ton of buffers
and also because we used to run into Linux's limit on the number
of simultaneous mmap() calls.
v2: - update the commit message
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's also possible to monitor them via performance counters but
the hardware can only use two counters simultaneously. It seems
easier to re-use the existing code which reads from MMIO instead
of writing a multi-pass approach.
v2: - add new lines after ':'
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
|
|
| |
It should be close to the GPU load, but it can be much lower if something
is stalling shader execution (e.g. CP DMA).
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
|
|
|
|
|
| |
CP DMA calls are synchronous with regard to shaders, but can be made
asynchronous if needed.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
|
|
|
|
| |
This is an 8-month old patch.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
|
|
|
|
|
| |
v2: fix a comment (Gustaw Smolarczyk)
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
|
|
| |
To ensure that fences are properly initialized.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
|
|
|
| |
Save compute shader state that will be used for the ARB_query_buffer_object
implementation.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
|
|
|
|
|
| |
num-cs-flushes will mean compute shader flushes
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
|
|
|
|
| |
If the kernel driver doesn't support it, it returns 0.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
|
|
|
|
|
|
| |
mainly for monitoring visible VRAM congestion
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
|
|
|
|
| |
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
|
|
|
|
| |
also print the average count per frame
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94984
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
| |
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
| |
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
|
|
|
|
|
| |
not used anymore
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
|
|
|
|
|
|
| |
This is yet another change motivated by appeasing AMD GPUPerfStudio's
hardcoding of performance counter group numbers.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
|
|
|
|
| |
This group was used by older versions of AMD GPUPerfStudio (via
AMD_performance_monitor) to identify the GPU family, and GPUPerfStudio
still complains when it isn't available.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
|
|
| |
This shaves a bit more time off the startup of programs that don't
actually use performance counters.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Expose most of the performance counter groups that are exposed by Catalyst.
Ideally, the driver will work with GPUPerfStudio at some point, but we are not
quite there yet. In any case, this is the reason for grouping multiple
instances of hardware blocks in the way it is implemented.
The counters can also be shown using the Gallium HUD. If one is interested to
see how work is distributed across multiple shader engines, one can set the
environment variable RADEON_PC_SEPARATE_SE=1 to obtain finer-grained performance
counter groups.
Part of the implementation is in radeon because an implementation for
older hardware would largely follow along the same lines, but exposing
a different set of blocks which are programmed slightly differently.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The point of prepare_buffer is to ensure that the query buffer contains valid
initial data for conditional rendering: as long as the buffer is initialized
correctly, the GPU is able to tell whether query results have been written
already (and wait or fall back to unconditional rendering if desired).
This means prepare_buffer needs to be called again when a buffer is reused.
Conversely, for queries that cannot be used for conditional rendering
(notably pipeline statistics), we can re-use buffers immediately, and they
do not need to be initialized.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Andy Furniss <adf.lists@gmail.com>
|
|
|
|
|
|
| |
This will be important for perfcounter queries.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
[Fixed a rebase conflict and re-tested before pushing.]
|
|
|
|
|
|
| |
We will need the clear_result override for the batch query implementation.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
|
|
|
|
| |
The idea here is that driver queries implemented outside of common code
will use the same query buffer handling with different logic for starting
and stopping the corresponding counters.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
[Fixed a rebase conflict and re-tested before pushing.]
|
|
|
|
|
|
|
|
|
| |
Move r600_query and r600_query_hw into the header because we will want to
reuse the buffer handling and suspend/resume logic outside of the common
radeon code.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
[Fixed a rebase conflict and re-tested before pushing.]
|
|
|
|
|
|
|
|
| |
The goal here is to be able to move the implementation details of hardware-
specific queries (in particular, performance counters) out of the common code.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
[Fixed a rebase conflict and re-tested before pushing.]
|
|
More query-related structures will have to be moved into their own
header file to support hardware-specific performance counters.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|