| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To read out MP perf counters we use a compute shader and need to upload
input data like a 64-bits addr used to store the values and a sequence
ID for synchronization. Currently, this input data is uploaded as user
uniforms which means that it's sticked to c0[], but if a compute shader
from a real application is used, monitoring those performance counters
will just overwrite some data and miserably crash.
Instead, sticking the 64-bits addr and the sequence into the driver
constant buffer seems like much better and will allow to monitor
counters with GL 4.3 apps.
Tested on GF119 and GK110, but should not hurt anything on GK104.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
Not piping this all the way through yet, but no better place to note
this down. This will can be used with NV_viewport_array2.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
For fermi, this likely will require use of linked tsc mode. However on
bindless architectures, we can have as many as we want. As it stands,
the AUX_TEX_INFO has 32 teture handles reserved.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The GALLIUM_HUD does not yet expose a description for each events, but
this might be useful for developers who want to have a long description
of hw perf counters directly in the source code.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
We apparently pass all the relevant CTS tests. There are probably some
shortcomings, but they can be addressed down the line.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some hardware supports primitive restart on patch primitives, and other
hardware does not. Modern GL and ES include a query for this feature;
adding a capability bit will allow us to answer it.
As far as I know, AMD hardware does not support this feature, while
NVIDIA and Intel hardware does. However, most Gallium drivers do not
appear to support tessellation shaders yet. So, I've enabled it for
nvc0 and disabled it everywhere else.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
Constbufs are only aliased on Fermi and this will reduce the number of
flushes when we switch between 3d and compute.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This happens with dEQP tests. The code doesn't at all protect against
this condition, so while unhandled, this is an expected situation.
Also avoid using more than the first 16 registers for nv3x vertex
programs.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
On nv30, for example, there is no hardware index buffer support. So all
of those will be created entirely in user memory.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the array doesn't start at 0 we need to account for su->tex.r.
While we are at it, make sure to avoid out of bounds access by masking
the index.
This fixes GL45-CTS.shading_language_420pack.binding_image_array.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reported-by: Dave Airlie <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
Apparently the stencil mask applies to clears on nv30/nv40. Reset it to
0xff before doing a stencil clear. This fixes gl-1.0-readpixsanity and
a number of other piglit tests.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
| |
The mesa state tracker has recently started to query this.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
This fixes a lot of INVALID_VALUE errors reported by the card when
running dEQP tests.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "11.1 11.2" <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Left over from the pre-mainline tess support. Adapt to use the new
defines.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
Cc: "11.1 11.2" <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
Cc: "11.1 11.2" <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
This makes sure that rIndirectSrc and other things stay updated.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Initially to make sure the format doesn't mismatch and won't produce
out-of-bounds access, we checked that both formats have exactly the same
number of bytes, but this should not be checked for type stores.
This fixes serious rendering issues in the UE4 demos (tested with
realistic and reflections).
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
Trivial.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To prevent out-of-bounds access and format mismatch we add a predicate
on sustp, but we have to account for it when the sources are condensed
because a predicate is a source. Using the range 3:6 will only condense
the input data and it's always the case. This also fixes constraints
when an indirect access is used.
This ensures that sources are correctly aligned.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Address registers are always loaded right before use. Don't treat them
as "global", which will cause them to be put into the function's
linkage, and will make the register allocator hold onto that
register until the end of the function.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jan Vesely <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We were always doing atomics on shared memory location 0 instead of the
originally supplied location. Make sure to pass through the original
symbol and any indirection.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
Cc: [email protected] # note: expect minor conflict
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
INTERP is defined (by me) to have to have a INPUT source. However the
state tracker does not always obey this. This happens due to varying
packing logic introducing additional mov's which can't always be undone.
Instead of just giving up, we instead try harder to find the original
input. This won't always be possible, for example with indirect
accesses. There's not much we can (easily) do about that though.
This fixes the remaining interpolateAt* failures in dEQP:
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at*
some of which were asserting due to INTERP_* being passed a non-input.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes
dEQP-GLES31.functional.draw_indirect.draw_elements_indirect.*.default_attribute
These tests were causing a const vbo to be set up, and were small enough
draws that the logic was trying to go via the push path (which emits
data directly into the cmd stream rather than uploading a user vbo).
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was handled in handleTEX(), however the way the logic works, those
extra arguments aren't added on by then, so it did nothing. Instead we
must duplicate that bit here. GK110 appears to complain about
MISALIGNED_GPR, however it's reasonable to believe that GK104 has the
same requirements.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95403
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Cull distances are just a special case of clip distances as far as the
hardware is concerned. Make sure that the relevant "planes" are enabled,
and flip the clip mode to cull for those.
Signed-off-by: Tobias Klausmann <[email protected]>
[imirkin: add enables on nvc0, add nv50 support]
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tobias Klausmann <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This lets us safely enable or disable the extension as needed
Signed-off-by: Tobias Klausmann <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
Trivial.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
This reduces the number of loop iterations for invalidating buffers
and images.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
This is a pretty rare situation but this can happen though.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The SAMPLEMASK semantic should only return the bits set covered by the
current invocation. However we were always retrieving the covmask, which
returns the covered samples of the whole pixel.
When not doing per-sample invocation, this is precisely what we want.
However when doing per-sample invocation, we have to select the
sampleid'th bit and only return that. Furthermore, this means that we
have to have a 1:1 correlation for invocations and samples.
This fixes most
dEQP-GLES31.functional.shaders.sample_variables.sample_mask_in.*
tests. A few failures remain due to disagreements about nr_samples==1
logic as well as what happens with MSAA x2 RTs when the shading fraction
is 0.5.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Compute support seems to be pretty stable now, and according to piglit
it doesn't seem to break 3D state.
As a side effect, this will expose ARB_compute_shader on GK110/GK208.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
We don't need them for compute shaders.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
Like other resources, we need to unreference all images.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The kernel is now more strict with the class ids it exposes, so we need
to check the G98 and MCP89 classes as well as the GT215 class. This
effectively caused us to decide there were no decoding capabilities on
newer kernel for VP3 chips.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95251
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "11.2" <[email protected]>
|
|
|
|
|
|
|
| |
metric-issue_slot_utilization and metric-branch_efficiency are already
computed as percentages.
Signed-off-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
| |
This makes more sense for them.
Signed-off-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
| |
This will allow to use percentages for some metrics because the Gallium
HUD doesn't allow to display floating point numbers and 0 is printed
instead.
Signed-off-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This is most likely a copy-paste error when I reworked this area few
weeks ago. For SM20, metric-issue_slots is equal to inst_issued because
there is only one pipeline, so the metric is not exposed there.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reported-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
| |
This might be useful to avoid breaking the current compute state when
monitoring MP perf counters because we use a compute kernel to read out
those counters. This has been initially suggested by Ilia Mirkin.
Signed-off-by: Samuel Pitoiset <[email protected]>
|