| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, NIR had a single nir_var_uniform mode used for atomic
counters, UBOs, samplers, images, and normal uniforms. This commit
splits this into nir_var_uniform and nir_var_ubo where nir_var_uniform
is still a bit of a catch-all but the nir_var_ubo is specific to UBOs.
While we're at it, we also rename shader_storage to ssbo to follow the
convention.
We need this so that we can distinguish between normal uniforms and UBO
access at the deref level without going all the way back variable and
seeing if it has an interface type.
Reviewed-by: Alejandro Piñeiro <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The functional change here is moving the nir_lower_io_to_scalar_early()
calls inside st_nir_link_shaders() and moving the st_nir_opts() call
after the call to nir_lower_io_arrays_to_elements().
This fixes a bug with the following piglit test due to the current code
not cleaning up dead code after we lower arrays. This was causing an
assert in the new duplicate varyings link time opt introduced in
70be9afccb23.
tests/spec/glsl-1.10/execution/vsfs-unused-array-member.shader_test
Moving the nir_lower_io_to_scalar_early() calls also allows us to tidy
up the code a little and merge some loops.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The following patches will add support for an additional
optimisation so this function will no longer just optimise varying
constants.
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This will help the new opt introduced in the following patches
allowing us to remove extra duplicate varyings.
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Not sure if this ever worked, but the current logic for setting the
min/max index is definitely wrong for indexed draws. While we're at it,
bring in all the usual logic from the non-indirect drawing path.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109086
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
Nobody uses this, so let's drop it. This makes the helper callable
from places without a gl_program.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
DrawPixels lowering, for example, adds new varyings that need to be
accounted for in inputs_read. The earlier info gathering at link time
cannot account for this.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
They're now identical, so we can just compile it once.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Now that we always copy color, we can just use the util function.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The glDrawPixels passthrough vertex shader copies position and texcoord
vertex attributes to varying outputs. It also optionally copies a third
gl_Color attribute, which sometimes is unnecessary. Until now, we've
compiled separate variants of the shader, one of which does this extra
copy, and the other of which doesn't. We have done this since 2007.
But, the vertex shader runs for a whopping four vertices, and so the
cost of a copying a single input to output is likely inconsequential.
In theory, we could bind one fewer vertex element - but we always bind
all three regardless. So, we don't even get that savings.
This patch unifies the two, so we always copy the optional color,
and save having to compile the variant. It also makes the VS input
interface match up with the vertex element state without any dead
(unused) input attributes.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Dead since 2015 (commit 5142564734bd68f165b02e29e384ebbcf91cce38).
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A long time ago, when this was first implemented, not having a sampler
bound would cause problems on Fermi. I didn't work out the reasons, but
the solution was simple -- just put the samplers back in.
Since then, regular texturing paths appear to have lost their associated
samplers which required a fuller investigation and fix in nouveau. Now
that this is done, this code should no longer need a sampler state for
fetching texels from a buffer texture.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On some GPUs, especially older Intel GPUs, some math instructions are
very expensive. On those architectures, don't reduce flow control to a
csel if one of the branches contains one of these expensive math
instructions.
This prevents a bunch of cycle count regressions on pre-Gen6 platforms
with a later patch (intel/compiler: More peephole select for pre-Gen6).
v2: Remove stray #if block. Noticed by Thomas.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Thomas Helland <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
That flow control may be trying to avoid invalid loads. On at least
some platforms, those loads can also be expensive.
No shader-db changes on any Intel platform (even with the later patch
"intel/compiler: More peephole select").
v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested
by Rob. See also the big comment in src/intel/compiler/brw_nir.c.
v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from
nir_lower_io_arrays_to_elements.c).
v4: Fix inverted condition in brw_nir.c. Noticed by Lionel.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
We have a NIR path, and V3D doesn't have TGSI input for compute (only what
TTN can handle for the various gallium-internal shaders).
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
LinkedTransformFeedback is normally populated, which had nerf'd varying
packing since the check was introduced.
Fixes: dbd52585fa9 st/nir: Disable varying packing when doing transform feedback.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
| |
In gallium, we model the attachment sample count as a new nr_samples
field in pipe_surface. A driver can indicate support for the extension
using the new pipe cap, PIPE_CAP_MULTISAMPLED_RENDER_TO_TEXTURE.
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
For format fallbacks like ETC and ASTC, switching between sRGB and linear
decoding is undefined, or at least is not bit-exact. Same as
EXT_texture_sRGB_decode on GLES.
There are no piglit or dEQP regresssions.
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
This reverts commit 198c50f4873758e9f64d89eea262af5dd1644df9.
This needs to be reverted after commit 017199d2d2e4 ("mesa: Revert
INTEL_fragment_shader_ordering support")
|
|
|
|
|
|
|
| |
This should be equalent of what we did before.
Signed-off-by: Erik Faye-Lund <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Android has cpufeatures library but pinning of threads is not supported
PIPE_OS_LINUX code path causes build error due to sched_getcpu() unavailable
thus we need to avoid setting HAVE_SCHED_GETCPU for Android
Fixes: 48f2160 ("st/mesa: regularly re-pin driver threads to the CCX where the app thread is")
Signed-off-by: Mauro Rossi <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
For RGB surfaces (for example) we don't really care that the colormask
is 0x7 instead of 0xf. This should not trigger clear_with_quad()
slowpath.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
If we can't clear all the buffers with pctx->clear() (say, for example,
because of ColorMask), push the buffers we *can* clear with pctx->clear()
first. Tilers want to see clears coming before draws to enable fast-
paths, and clearing one of the attachments with a quad-draw first
confuses that logic.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use VAO binding information in feedback rendering. In theory
it should reduce the amount of buffer objects scheduled for rendering.
Feedback rendering is implemented in a crude way anyhow, so I do not
expect much gain here. But for the sake of code reuse we should
use the same code for the same task. And finally if feeback rendering
may get improved the array setup is already well done there.
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The change removes the reference that is held on the entries of the
vbuffers[] array. The new code does not do that anymore as following
the code into draw_set_vertex_buffers() the draw context holds an
other reference as long as it is reset down the function again.
So it should be already by that argument save to remove that
additional reference count.
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Factor out vertex array setup routines from the array state atom.
The factored functions will be used in feedback rendering in the
next change.
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|
|
|
|
|
|
|
|
| |
In st_atom_array, we only need to unmap the upload buffer that
was actually used.
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|
|
|
|
|
|
|
|
| |
In st_atom_array, we only need to care for unmapping the upload buffer
if we actually used it.
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Factor out struct gl_vertex_format from array attributes.
The data type is supposed to describe the type of a vertex
element. At this current stage the data type is only used
with the VAO, but actually is useful in various other places.
Due to the bitfields being used, special care needs to be
taken for the glGet code paths.
v2: Change unsigned char -> GLubyte.
Use struct assignment for struct gl_vertex_format.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|
|
|
|
|
|
|
|
| |
radeonsi has 3 driver threads (glthread, gallium, winsys), other drivers
may have 2 (glthread, gallium), so it makes sense to pin them to a random
CCX and keep that irrespective of the app thread.
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This is used when glthread is disabled.
Mesa pretty much chases the app thread on the CPU.
The performance is the same as pinning the app thread.
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Meson test has a concepts of suites, which allow tests to be grouped
together. This allows for a subtest of tests to be run only (say only
the tests for nir). A test can be added to more than one suite, but for
the most part I've only added a test to a single suite, though I've
added a compiler group that includes nir, glsl, and glcpp tests.
To use this you'll need to invoke meson test directly, instead of ninja
test (which always runs all targets). it can be invoked as:
`meson test -C builddir --suite $suitename` (meson test has addition
options that are pretty useful).
Tested-By: Gert Wollny <[email protected]>
Acked-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
| |
This implementation can have massive drawbacks.
Cc: 18.3 <[email protected]>
Reviewed-by: Edmondo Tommasina <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Shader-db results radeonsi (VEGA):
Totals from affected shaders:
SGPRS: 161464 -> 161368 (-0.06 %)
VGPRS: 86904 -> 86292 (-0.70 %)
Spilled SGPRs: 296 -> 314 (6.08 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 3618596 -> 3573852 (-1.24 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 26189 -> 26276 (0.33 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the non-SSO case, where multiple shader stages are linked together,
we were recording garbage pipe_stream_output_info structures for all
but the last enabled geometry-processing stage.
Specifically, we were using the gl_transform_feedback_info from
shader_program->last_vert_prog (the stage whose outputs will be
recorded)...but were pairing it with the output varying mappings
from the current shader stage. For example, a program with a VS and
GS, the VS's pipe_shader_state would have a pipe_stream_output_info
based on the GS transform feedback info, but the VS output mapping.
This generally worked out okay because only the pipe_stream_output_info
for the last stage really matters - the others can be ignored. However,
we'd like to avoid confusing the pipe driver. In particular, my new
driver translates the stream out information to hardware packets at
bind_{vs,tes,gs}_state() time...and was hitting asserts about garbage
varyings that didn't exist.
This patch changes st/mesa to record a blank pipe_stream_output_info
with num_outputs = 0 for all stages prior to last_vert_prog. The last
one is captured as normal.
(In the fully-SSO case, nothing should change - each program contains
a single shader stage, so last_vert_prog *is* the current shader.)
Tested with llvmpipe (piglit's gpu profile), and freedreno (a3xx,
gpu profile with -t transform.feedback). Fixes several hundred CTS
tests on my new driver.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
| |
ARB programs won't have one of these, and we don't use it anyway.
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
This will let me use it in the ARB program code as well.
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This only adds support on the Gallium core level, for the drivers
it is likely that additional changes are needed to support the
new texture format and thereby enabling the extension.
Enables on softpipe and makes pass:
dEQP-GLES31.functional.srgb_texture_decode.skip_decode.sr8.*
v2: - add include for getting GL_SR8_EXT
v4: - since the extension is not required don't bother providing
a fallback (Ilia Mirkin)
- split patch (2/2) to separate Gallium and mesa/st parts
(Roland Scheidegger)
- trim commit message to only contain the history of the patch
relevant to this part
v5: - don't include GLES headers (required enum has been added to glheader.h)
(Ilia Mirkin)
Signed-off-by: Gert Wollny <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
The only reader of the weak field in _mesa_prim is pretty
console printing. By that, remove the weak field from _mesa_prim.
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|
|
|
|
|
|
|
|
| |
ffs() just returns the bit that is set, we need to know what
stage that bit represents so use u_bit_scan() instead.
Fixes: 2ca5d9548fc4 ("st/glsl_to_nir: gather next_stage in shader_info")
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Fixes: edded1237607 ("mesa: rework ParameterList to allow packing")
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
TGSI has no I64MAD/U64MAD opcode.
Fixes: 278580729a5 ('st/glsl_to_tgsi: add support for 64-bit integers')
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Only radeonsi uses them, so adjust them to match its needs.
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|