| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Signed-off-by: Edward O'Callaghan <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
This allows to monitor these performance metrics through
GL_AMD_performance_monitor.
Signed-off-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
| |
This implements more performance metrics than the previous support,
but some other metrics still need to be figured out.
Signed-off-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
| |
Those bits were related to old performance metrics support.
Signed-off-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
| |
These performance metrics will be re-introduced in an upcoming
patch that will follow the same design as Fermi.
Signed-off-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
| |
inst_issued is performance metric not a hardware event on Kepler (SM30).
It will be re-introduced in an upcoming patch.
Signed-off-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
| |
SM30 is the compute capability version for GK104/GK106/GK107.
This also introduces a new signal group selection called UNK0F.
Signed-off-by: Samuel Pitoiset <[email protected]>
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
For the compute support, we might stick buffers as surfaces. This fixes
an assertion when executing src/gallium/tests/trivial/compute.
To avoid using these "restricted" surfaces as render targets, these
assertions have been moved. Note that it's already handled for the
framebuffer thing on nvc0.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
In case that the buffer has no bind at all, assume it can be a regular
buffer. This can happen on buffers created through the ARB_dsa
interfaces.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "11.0 11.1" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some drivers (in particular radeon[si], but also freedreno judging from
a quick grep) may want to expose performance counters that cannot be
individually enabled or disabled.
Allow such drivers to mark driver-specific queries as requiring a new
type of batch query object that is used to start and stop a list of queries
simultaneously.
v3: adjust recently added nv50 queries
v2: documentation for create_batch_query
Reviewed-by: Samuel Pitoiset <[email protected]>
Tested-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was only used to implement an unnecessarily restrictive interpretation
of the spec of AMD_performance_monitor. The spec says
A performance monitor consists of a number of hardware and software
counters that can be sampled by the GPU and reported back to the
application.
I guess one could take this as a requirement that counters _must_ be sampled
by the GPU, but then why are they called _software_ counters? Besides,
there's not much reason _not_ to expose all counters that are available,
and this simplifies the code.
v3: add a missing change in the nouveau driver (thanks Samuel Pitoiset)
Reviewed-by: Samuel Pitoiset <[email protected]>
Tested-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
| |
Only the regular "clear" call is supposed to respect the render
condition. The rest should ignore it.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
No need to allocate more GPR than used in the compute kernel which
reads MP performance counters on Fermi.
Signed-off-by: Samuel Pitoiset <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Altough the compute support is still not complete because textures and
surfaces need to be implemented, it allows to launch very simple compute
kernel like one which reads reading MP performance counters.
This turns on PIPE_CAP_COMPUTE and PIPE_SHADER_COMPUTE.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 342e68dc60 (nvc0: remove BGRA4 format support) removed the
support to fix a WoW trace. However after further experimentation, I was
able to get the blit to work by using a different "fake" format in the
2d engine.
The reason why this worked on nv50 is that nv50 falls back to the 3d
blit path in case either the src or the dst aren't "faithfully"
supported, while nvc0 only does it for the dst format. RG8 is better
supported by the nvc0 2d engine than R16.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We also have the "reserved for kick" space available. Some of my earlier
changes can probably be removed, but this is a quick fix for some of the
rarer fallout.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: <[email protected]>
|
|
|
|
|
|
|
| |
This fixes crashes with some piglit OpenCL tests.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
To get the size (in bytes) of a compute parameter, clover first calls
get_compute_param() with a NULL data pointer. The RET() macro is based
on nv50.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
This allows to monitor those performance metrics through
GL_AMD_performance_monitor.
Signed-off-by: Samuel Pitoiset <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
Avoid deferring building shaders until draw time, should hopefully
reduce any stuttering, as well as enable shader-db style analysis.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unfortunately flatshading is an all-or-nothing proposition on nvc0,
while GL 3.0 calls for the ability to selectively specify explicit
interpolation parameters on gl_Color/gl_SecondaryColor which would
override the flatshading setting. This allows us to fix up the
interpolation settings after shader generation based on rasterizer
settings.
While we're at it, we can add support for dynamically forcing all
(non-flat) shader inputs to be interpolated per-sample, which allows
st/mesa to not generate variants for these.
Fixes the remaining failing glsl-1.30/execution/interpolation piglits.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Julien Isorce <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
| |
For ARB_copy_image.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The edgeflag comes in as ubyte with glEdgeFlagPointer but as float with
plain immediate glEdgeFlag. Avoid reading bytes that weren't meant for
the edgeflag in the pointer case.
Fixes intermittent failures with gl-2.0-edgeflag piglit (and valgrind
complaints about reading uninitialized memory).
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This avoids a serious r600g bug leading to a GPU hang.
The chances this bug will get fixed are pretty low now.
I deeply regret listening to others and not pushing this patch, leaving
other users with a GPU-crashing driver. Yes, it should be fixed
in the compiler and it's ugly, but users couldn't care less about that.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86720
Cc: 11.0 10.6 <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
| |
I'll let drivers figure out how to do it.
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It looks like binding a constant buffer on compute overwrites the 3D
state. To avoid that, we already re-bind all the 3D constant buffers
after launching a compute grid but this is not enough.
Binding the constant buffer of input parameters for the compute state at
initialization corrupts the 3D constant buffers, and it's just useless
to bind it because this is not needed until we really launch a grid.
This fixes some piglit regressions related to interpolation tests
introduced in "nvc0: enable compute support by default on Fermi".
Fixes: 00d6186 (nvc0: enable compute support by default on Fermi)
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
As explained in the CUDA toolkit documentation, "a metric is a
characteristic of an application that is calculated from one or more
event values."
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
MP counters on GF100/GF110 (compute capability 2.0) are buggy
because there is a context-switch problem that we need to fix.
Results might be wrong sometimes, be careful!
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
GF100 and GF110 chipsets are compute capability 2.0, while the other
Fermi chipsets are compute capability 2.1. That's why, some MP counters
are different between these chipsets and we need to handle variants.
Signed-off-by: Samuel Pitoiet <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
This will help for handling HW SM queries variants on Fermi.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Compute support was not enabled by default because weird effects
on 3D state happened, but I can't reproduce them anymore.
This also enables MP performance counters by default on Fermi.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because we can't expose the number of hardware counters needed for each
different query, we don't want to allow more than one active query
simultaneously to avoid failure when the maximum number of counters
is reached. Note that these groups of GPU counters are currently only
used by AMD_performance_monitor.
Like for Kepler, this limits the maximum number of active queries
to 1 on Fermi.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a card has more than one GPC, the grid used by the compute
kernel which reads MP performance counters seems to be too small.
The consequence is that the kernel is not launched on all TPCs.
Increasing the grid size using the number of GPCs now launches
enough blocks and we can read MP performance counters of all TPCs.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
NOUVEAU_GETPARAM_GRAPH_UNITS param returns the number of GPCs, the total
number of TPCs and the number of ROP units. Note that when the DRM
version is too old the default number of GPCs is fixed to 4.
This will be used to launch the compute kernel which is used to read MP
performance counters over all GPCs.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Memory access have to be aligned to 128-bits. Note that this
doesn't happen when the card only has TPC.
This patch fixes the following dmesg fail:
gr: GPC0/TPC1/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 000f
[UNALIGNED_MEM_ACCESS]
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
For strange reasons, the signal id depends on the slot selected on Fermi
but not on Kepler. Fortunately, the signal ids are just offseted by the
slot id!
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Queries which use more than one MP counters was misconfigured and
computing the final result was also wrong because sources need to
be configured on different hardware counters instead.
According to the blob, computing the result is now as follows:
FOR i..n
val += ctr[i] * pow(2, i)
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
On Fermi, we have one domain of 8 MP counters while we have
two domains of 4 MP counters on Kepler.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
Sequence fields are located at MP[i] + 0x20 in the buffer object.
This is used to check if result is available for MP[i].
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Writing 0x408000 to 0x419e00 (like on Kepler) has no effect on Fermi
because we only have one domain of 8 counters. Instead, we have to
write 0x80000000.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|