| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
We emitted instructions loading the bindless handle after the memory
instruction.
Cc: 17.2 <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
just don't propagate output reads
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Found while trying to optimize an application.
Not observed to help performance on i965, but should at least reduce
the memory usage of such textures a bit.
Reviewed-by: Eric Anholt <[email protected]>
Tested-by: Eero Tamminen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
| |
Otherwise it will be missing from the tarball.
Fixes: f7daa737d17 ("mesa: Combine libtxc_dxtn sources into
texcompress_s3tc_tmp.h")
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
| |
Found with MinGW optimized build.
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
| |
Found with MinGW optimized build.
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
Now never null!
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
| |
When this file is included by Gallium, the fprintf causes it to fail to
compile. This is an unreachable error case, and we shouldn't be calling
fprintf directly.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
This file will be #included, so the functions should be static.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
Has been disabled for 12 years.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
| |
Imported from master (commit ef07298391c6dcad843e0b13e985090c1dd76e76)
of https://cgit.freedesktop.org/~mareko/libtxc_dxtn/
Acked-by: Nicolai Hähnle <[email protected]>
Acked-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Use st_egl_image instead. radeonsi doesn't like when we create
a pipe_surface with PIPE_FORMAT_NV12.
This fixes NV12 texturing on radeonsi using kmscube.
Cc: 17.1 17.2 <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We can implement ARB_indirect_parameters for i965 by
taking advantage of the conditional rendering mechanism.
This works by issuing maxdrawcount draw calls and using
conditional rendering to predicate each of them with
"drawcount > gl_DrawID"
Signed-off-by: Plamena Manolova <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
In order to add our ARB_indirect_parameters implementation we
need to refactor brw_try_draw_prims so that it operates on a
per primitive basis and move the loop into brw_draw_prims.
This commit refactors the brw_try_draw_prims function and
renames it to brw_draw_single_prim.
Signed-off-by: Plamena Manolova <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
In order to add our ARB_indirect_parameters implementation we
need to refactor brw_try_draw_prims so that it operates on a
per primitive basis and move the loop into brw_draw_prims.
This commit introduces the brw_finish_drawing function where
we move the code that executes once after the loop.
Signed-off-by: Plamena Manolova <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
In order to add our ARB_indirect_parameters implementation we
need to refactor brw_try_draw_prims so that it operates on a
per primitive basis and move the loop into brw_draw_prims.
This commit introduces the brw_prepare_drawing function where
we move the code that executes once before the loop.
Signed-off-by: Plamena Manolova <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This is the last step of fixing
dEQP-GLES3.functional.fbo.completeness.renderable.texture.color0.rgb_unsigned_int_2_10_10_10_rev
for radeonsi.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The EXT_texture_type_2_10_10_10_REV (ES only) states the following issue:
"1. Should textures specified with this type be renderable?
UNRESOLVED: No. A separate extension could provide this functionality."
This partially fixes
dEQP-GLES3.functional.fbo.completeness.renderable.texture.color0.{rgb,rgba}_unsigned_int_2_10_10_10_rev
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
ES requires it. This is a partial fix for
dEQP-GLES3.functional.fbo.completeness.renderable.texture.color0.rgb_unsigned_int_2_10_10_10_rev
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
It leads to surprising states with integer inputs and outputs on
vertex processing stages (e.g. geometry stages). Instead, rely on the
driver to choose smooth interpolation by default.
We still allow varyings to match when one stage declares it as smooth
and the other declares it without interpolation qualifiers.
Reviewed-by: Marek Olšák <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can start reading the URB at the first offset that contains varyings
that are actually read in the URB. We still need to make sure that we
read at least one varying to honor hardware requirements.
This helps alleviate a problem introduced with 99df02ca26f61 for
separate shader objects: without separate shader objects we assign
locations sequentially, however, since that commit we have changed the
method for SSO so that the VUE slot assigned depends on the number of
builtin slots plus the location assigned to the varying. This fixed
layout is intended to help SSO programs by avoiding on-the-fly recompiles
when swapping out shaders, however, it also means that if a varying uses
a large location number close to the maximum allowed by the SF/FS units
(31), then the offset introduced by the number of builtin slots can push
the location outside the range and trigger an assertion.
This problem is affecting at least the following CTS tests for
enhanced layouts:
KHR-GL45.enhanced_layouts.varying_array_components
KHR-GL45.enhanced_layouts.varying_array_locations
KHR-GL45.enhanced_layouts.varying_components
KHR-GL45.enhanced_layouts.varying_locations
which use SSO and the the location layout qualifier to select such
location numbers explicitly.
This change helps these tests because for SSO we always have to include
things such as VARYING_SLOT_CLIP_DIST{0,1} even if the fragment shader is
very unlikely to read them, so by doing this we free builtin slots from
the fixed VUE layout and we avoid the tests to crash in this scenario.
Of course, this is not a proper fix, we'd still run into problems if someone
tries to use an explicit max location and read gl_ViewportIndex, gl_LayerID or
gl_CullDistancein in the FS, but that would be a much less common bug and we
can probably wait to see if anyone actually runs into that situation in a real
world scenario before making the decision that more aggresive changes are
required to support this without reverting 99df02ca26f61.
v2:
- Add a debug message when we skip clip distances (Ilia)
- we also need to account for this when we compute the urb setup
for the fragment shader stage, so add a compiler util to compute
the first slot that we need to read from the URB instead of
replicating the logic in both places.
v3:
- Make the util more generic so it can account for all unused slots
at the beginning of the URB, that will make it more useful (Ken).
- Drop the debug message, it was not what Ilia was asking for.
Suggested-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Overriding the default (no-op) swizzle is clearly counter-productive,
since the whole point is putting the destination register as one of
the source operands so that it remains unmodified when the assignment
condition is false.
Fragment depth and stencil outputs are a special case due to how their
source swizzles are manipulated in translate_src when compiling to
TGSI.
Fixes dEQP-GLES2.functional.shaders.conditionals.if.*_vertex
Cc: [email protected]
Reviewed-by: Marek Olšák <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Found by address sanitizer.
The loop here tries to be safe, but in doing so, it ends up doing
exactly the wrong thing: the safe foreach is for when the loop
variable (inst) could be deleted and nothing else. However, this
particular can delete inst's successor, but not inst itself.
Fixes: 8c6a0ebaad72 ("st/mesa: add st fp64 support (v7.1)")
Reviewed-by: Marek Olšák <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This way, when NIR_PASS_V makes a clone of the shader (for testing
nir_clone), the new and lowered version gets re-assigned to prog->nir.
[[email protected]: Tested NIR_TEST_CLONE=1 with valgrind]
Tested-by: Jordan Justen <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
| |
[[email protected]: Tested NIR_TEST_CLONE=1 with valgrind]
Tested-by: Jordan Justen <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The way NIR_PASS works (and, by extension, nir_optimize) is that they
may clone the shader and throw the old one away. (We use this for
testing nir_clone.) It's better if we just make a temporary variable,
use it for everything, and re-assign to the gl_program at the end.
[[email protected]: Tested NIR_TEST_CLONE=1 with valgrind]
Tested-by: Jordan Justen <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Cc: 17.1 17.2 <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have been exposing only 16 since 1e3e72e3054de with arguments
based on register pressure and the number of available GRFs, however,
our scalar backend will always limit the number of push registers
for GS threads to 24 and fallback to pull model for anything else,
so there is really no reason to lower the number under those arguments.
By bumping this up to 32 we make it the same as all the other stages,
which is a nice feature to have that can help applications in some
cases (I recently fixed a bug in CTS that assumed that the number
of input locations in a stage matches the number of output locations
in the previous stage for example).
Pre-gen8, we use the vector backend and push model, so in that case
the arguments in 1e3e72e3054de are still valid.
v2: check if we have scalar GS instead of the hw gen to enable this (Ken).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
This makes it easier to loop over programs.
Reviewed-by: Alejandro Piñeiro <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For now linking is just removing unused varyings between stages.
shader-db results BDW:
total instructions in shared programs: 13198288 -> 13191693 (-0.05%)
instructions in affected programs: 48325 -> 41730 (-13.65%)
helped: 473
HURT: 0
total cycles in shared programs: 541184926 -> 541159260 (-0.00%)
cycles in affected programs: 213238 -> 187572 (-12.04%)
helped: 435
HURT: 8
V2:
- lower indirects on demoted inputs as well as outputs.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This will allow us to insert a nir linking step in brw_link_shader().
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eduardo Lima Mitev <[email protected]>
|
|
|
|
|
|
|
|
| |
This will help us call gather info at a later point and allow us
to do some linking in nir.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eduardo Lima Mitev <[email protected]>
|
|
|
|
|
|
|
| |
do_flush_locked isn't a great name - especially given that there's no
locking going on in our code relating to execbuf.
Reviewed-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have a nice utility function for this, which eliminates the need for
locking stuff. This isn't really performance critical, but it's less
code to use the atomic.
p_atomic_inc_return does pre-increment rather than post-increment, so we
change screen->program_id to be initialized to 0 instead of 1. At which
point, we can just delete the initialization because intel_screen is
rzalloc'd.
Reviewed-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
| |
There's no real advantage or disadvantage here, it's just for stylistic
consistency with the rest of the codebase.
Reviewed-by: Chris Wilson <[email protected]>
|
|
|
|
| |
These have been unused for a while now.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If transform feedback is recording a varying, it needs a slot in the
VUE map, regardless of whether or not the shader writes it.
Together with the previous patch, this fixes:
- KHR-GL45.enhanced_layouts.xfb_capture_struct
The test captures a structure where the vertex shader writes the first
and third members - but the second still needs a slot.
Reviewed-by: Juan A. Suarez Romero <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
unify_interfaces() only updates the NIR program info, not the copy
in the gl_program itself. So, by using the old copy, we were missing
out on these updates.
The TCS/TES ones already did this correctly.
Reviewed-by: Juan A. Suarez Romero <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
brw_finish_batch emits commands needed at the end of every batch buffer,
including any workarounds. In the past, we freed up some "reserved"
batch space before calling it, so we would never have to flush during
it. This was error prone and easy to screw up, so I deleted it a while
back in favor of growing the batch.
There were two problems:
1. We're in the middle of flushing, so brw->no_batch_wrap is guaranteed
not to be set. Using BEGIN_BATCH() to emit commands would cause a
recursive flush rather than growing the buffer as intended.
2. We already recorded the throttling batch before growing, which
replaces brw->batch.bo with a different (larger) buffer. So growing
would break throttling.
These are easily remedied by shuffling some code around and whacking
brw->no_batch_wrap in brw_finish_batch(). This also now includes the
final workarounds in the batch usage statistics. Found by inspection.
Fixes: 2c46a67b4138631217141f (i965: Delete BATCH_RESERVED handling.)
Reviewed-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
| |
This is, by definition, finishing the batch.
Reviewed-by: Chris Wilson <[email protected]>
|