| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
Add new interface to create and encode
Signed-off-by: Leo Liu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
| |
The queries_suspended_for_flush flag is redundant because suspended queries
are not removed from their respective linked list.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some drivers (in particular radeon[si], but also freedreno judging from
a quick grep) may want to expose performance counters that cannot be
individually enabled or disabled.
Allow such drivers to mark driver-specific queries as requiring a new
type of batch query object that is used to start and stop a list of queries
simultaneously.
v3: adjust recently added nv50 queries
v2: documentation for create_batch_query
Reviewed-by: Samuel Pitoiset <[email protected]>
Tested-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was only used to implement an unnecessarily restrictive interpretation
of the spec of AMD_performance_monitor. The spec says
A performance monitor consists of a number of hardware and software
counters that can be sampled by the GPU and reported back to the
application.
I guess one could take this as a requirement that counters _must_ be sampled
by the GPU, but then why are they called _software_ counters? Besides,
there's not much reason _not_ to expose all counters that are available,
and this simplifies the code.
v3: add a missing change in the nouveau driver (thanks Samuel Pitoiset)
Reviewed-by: Samuel Pitoiset <[email protected]>
Tested-by: Samuel Pitoiset <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This is the NIR analog to GLSL IR ir_samples_identical.
v2: Don't add the second nir_tex_src_ms_index parameter. Suggested by
Ken and Jason.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
| |
This fixes teximage-colors, fbo-generatemipmap-formats, and probably
others (in relation to the RGB5 formats, others still fail).
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The lower layers assume that we support this, and it's been core since
GL 1.4. This fixes a slew of piglit tests, especially around
tex-miplevel-selection.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The hardware can actually generates vertexid when vertices come from
a client-side buffer like when glDrawElements is used.
This doesn't fix (or break) any piglit tests but it improves the
previous attempt of Ilia (c830d19 "nv50: avoid using inline vertex
data submit when gl_VertexID is used")
The only disadvantage is that only works on G84+, but we don't really
care of that weird and old NV50 chipset.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The a4xx bits corresponding to 'freedreno/a3xx: add fake RGTC support
(required for GL3)'
TODO some more r/e.. maybe we get lucky and hw supports some of this
directly? For now this will help us enable gl3.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main issue is that the current logic looked into cso->u.tex, which
is the wrong side of the union to look into for texture buffers. While I
was at it, it was easy enough to add the logic to handle offsets
(first_element).
- reduce texture buffer size limit (determined experimentally)
- don't look at first/last levels, instead look at first/last element
- include the first element offset
- set offset alignment to 16 (determined experimentally)
Signed-off-by: Ilia Mirkin <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
A smarter implementation would make it possible to attach this to emit
state for the BY_REGION versions to avoid breaking the tiling. But this
is a start.
Signed-off-by: Ilia Mirkin <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also throw in LATC while we're at it (same exact format). This could be
made more efficient by keeping a shadow compressed texture to use for
returning at map time. However... it's not worth it for now...
presumably compressed textures are not updated often.
Lastly fix up Z32S8 transfers to non-0 layers.
Signed-off-by: Ilia Mirkin <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The previously RE'd formats were from an ES driver implementing
OES_vertex_type_10_10_10_2 and thus backwards. A future change could add
the 2_10_10_10 support.
Signed-off-by: Ilia Mirkin <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We'd end up in a state where shader uses no inputs, yet num_elements is
greater than zero. Triggered by a TF vertex shader which did:
gl_Position = vec4(0.0, 0.0, 0.0, 0.0);
resulting in a binning pass variant with no inputs.
Includes equiv fix in a4xx, even though we don't have binning-pass
enabled yet on a4xx.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
point_size_per_vertex is always TRUE for GLES, causing us to configure
the hw as if gl_PointSize was written, even if it was not. Which makes
for grumpy hw.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Looks like this was forgotten in the commit which added the AFETCH
logic.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
| |
We always want to prefer the VGPU10 formats over the VGPU9 ones when
we have VGPU10 support.
Original patch by Jose and updated by Brian.
Reviewed-by: Charmaine Lee <[email protected]>
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This is important for the case of sampling from a depth texture. In
that case, we need to sample the texture as if it were a single-channel
color texture. For other/color formats, we can use the format as-is.
Reviewed-by: Charmaine Lee <[email protected]>
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
| |
This will be important for perfcounter queries.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
[Fixed a rebase conflict and re-tested before pushing.]
|
|
|
|
|
|
| |
We will need the clear_result override for the batch query implementation.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The idea here is that driver queries implemented outside of common code
will use the same query buffer handling with different logic for starting
and stopping the corresponding counters.
Reviewed-by: Marek Olšák <[email protected]>
[Fixed a rebase conflict and re-tested before pushing.]
|
|
|
|
|
|
|
|
|
| |
Move r600_query and r600_query_hw into the header because we will want to
reuse the buffer handling and suspend/resume logic outside of the common
radeon code.
Reviewed-by: Marek Olšák <[email protected]>
[Fixed a rebase conflict and re-tested before pushing.]
|
|
|
|
|
|
|
|
| |
Software queries are all queries that do not require suspend/resume
and explicit handling of result buffers.
Reviewed-by: Marek Olšák <[email protected]>
[Fixed a rebase conflict and re-tested before pushing.]
|
|
|
|
|
|
|
|
| |
The goal here is to be able to move the implementation details of hardware-
specific queries (in particular, performance counters) out of the common code.
Reviewed-by: Marek Olšák <[email protected]>
[Fixed a rebase conflict and re-tested before pushing.]
|
|
|
|
|
|
|
| |
More query-related structures will have to be moved into their own
header file to support hardware-specific performance counters.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DEQP likes to do math on uniforms, and the "fmaxabs dst, uni, uni" to get
the absolute value would get lowered. The lowering doesn't bother to try
to restrict the lifetime of the lowered uniforms, so we'd end up register
allocation failng due to this on 5 of the tests (More tests still fail in
RA, which look like we'll need to reduce lowered uniform lifetimes to
fix).
No changes on shader-db, though fewer extra MOVs are generated on even
glxgears (MOVs pair well enough that it ends up being the same instruction
count).
|
|
|
|
|
| |
This does actually happen in the wild (particularly fabs of a uniform), so
we'd like to support it.
|
| |
|
|
|
|
|
|
|
|
| |
It looks like nir_lower_idiv is going to use it soon, so add support.
With Ilia's change, this fixes one case in fs-op-div-large-uint-uint (with
GL 3.0 forced on).
Cc: "11.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Requires proper kernel tiling configuration so check the tiling
config registers.
v2: send the right version of the patch
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
| |
The offset is different on CI and newer.
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
| |
Otherwise it won't end up in the tarball.
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
Only the regular "clear" call is supposed to respect the render
condition. The rest should ignore it.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
Currently only one metric is exposed but more will be added later.
Signed-off-by: Samuel Pitoiset <[email protected]>
Tested-by: Pierre Moreau <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These compute-related MP performance counters have been reverse
engineered using CUPTI which is part of NVIDIA CUDA.
As for nvc0, we use a compute kernel to read out those performance
counters, and the command stream to configure them. Note that Tesla
only exposes 4 MP performance counters, while Fermi has 8.
Only G84+ is supported because G80 is an old and weird card.
Tested on G84, G96, G200, MCP79 and GT218 with glxgears, glxspheres64,
xonotic-glx, heaven and valley.
Signed-off-by: Samuel Pitoiset <[email protected]>
Tested-by: Pierre Moreau <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds the ability to launch simple compute kernels like the one I
will use to read out MP performance counters in the upcoming patch.
This compute support is based on the work of Francisco Jerez (aka curro)
that he did as part of his EVoC project in 2011/2012 to get OpenCL
working on Tesla. His original work can be found here:
https://github.com/curro/mesa/commits/nv50-compute
I did some improvements on the original code, like fixing using both 3D
and COMPUTE simultaneously, improving global buffers binding, and making
the code closer to what nvc0 already does. This compute support has been
tested by Pierre Moreau and myself with some compute kernels. This is a
step towards OpenCL.
Speaking about this, it seems like compute programs overlap fragment
programs when they are used both. To fix this, we need to re-validate
fragment programs when binding compute programs and vice versa.
Note that, textures, samplers and surfaces still need to be implemented.
Signed-off-by: Samuel Pitoiset <[email protected]>
Tested-by: Pierre Moreau <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
As for nvc0, we need to free memory allocated by interpolation
parameters. This fixes a memory leak spotted by valgrind.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
No need to allocate more GPR than used in the compute kernel which
reads MP performance counters on Fermi.
Signed-off-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|