| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
This helps valgrind when encode_type_to_blob is used.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
| |
Cc: [email protected]
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Not sure if this is the best place to put it, but we're going to need
this for NIR too.
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Also needed in freedreno/ir3.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are two issues with the current implementation. First, it relies
on the layout(local_size_*) happening in the same shader as the main
function, and secondly it doesn't work for variable group sizes.
In both cases, the simplest fix is to move the setup of these derived
values to a later time, similar to how the gl_VertexID workarounds are
done. There already exist system values defined for both of the derived
values, so we use them unconditionally, and lower them after linking is
performed.
While we're at it, we move to using gl_LocalGroupSizeARB instead of
gl_WorkGroupSize for variable group sizes.
Also the dead code elimination avoidance can be removed, since there
can be situations where gl_LocalGroupSizeARB is needed but has not been
inserted for the shader with main function. As a result, the lowering
code has to insert its own copies of the system values if needed.
Reported-by: Stephane Chevigny <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103393
Cc: [email protected]
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Having 4 variables all called "gl_in_TexCoord0@n" isn't very informative,
much better to see:
decl_var shader_in INTERP_MODE_NONE float gl_in_TexCoord0 (VARYING_SLOT_VAR0.x, 1, 0)
decl_var shader_in INTERP_MODE_NONE float gl_in_TexCoord0@0 (VARYING_SLOT_VAR0.y, 1, 0)
decl_var shader_in INTERP_MODE_NONE float gl_in_TexCoord0@1 (VARYING_SLOT_VAR0.z, 1, 0)
decl_var shader_in INTERP_MODE_NONE float gl_in_TexCoord0@2 (VARYING_SLOT_VAR0.w, 1, 0)
v2: Handle arrays and structs better (by Timothy)
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The pass only looks at var load/store intrinsics, not input load/store
intrinsics, so assert that we don't see the other type.
v2: Adjust comment indentation.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
| |
It's redundant with nir_shader::info::stage.
Acked-by: Timothy Arceri <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2:
- Use helper to add a new source to the texture instruction.
v3:
- Use nir_tex_instr_src_index() to simplify the patch (Jason).
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We only need to add a check to validate output locations here. For
inputs with invalid locations we will fail to link when we can't
find a matching output in the same (invalid) location.
v2: compute location slots properly depending on shader stage and
variable type / direction
Fixes:
KHR-GL45.enhanced_layouts.varying_location_limit
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We won't split varyings marked as always active because there
is no point in doing so. This means we need to mark both
sides of the interface as always active otherwise we will have
a mismatch and start removing things we shouldn't.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This is intended to be called before nir_lower_io() so that we
can do some linking optimisations with the results. It can also
be used with drivers that don't use nir_lower_io() at all such
as RADV.
v2: pass mode mask rather than first and last stage integer.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
ssize_t is a GNU extension and is not available on Windows or MacOS.
Instead, we use intptr_t which should be effectively equivalent and is
part of the C standard. This should fix the Windows and Mac OS builds.
Fixes: 3af1c829891a4530682bce113fdd512d4f2de3c6
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103253
Reviewed-by: Jose Fonseca <[email protected]>
Tested-by: Vinson Lee <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Since blob.h moved up to src/compiler the test should include that
instead of src/compiler/glsl
fixes: 0e3bd56c6ea783dbc ("compiler: Move blob up a level")
Signed-off-by: Dylan Baker <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
This looks like a copy+paste error. They don't actually write into that
variable as would be implied by putting the return there.
Reviewed-by: Lionel Landwerlin <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
| |
We didn't fold correctly in the case of 0x1 because we never let the
loop counter hit 0. Switching it to bit >= 0 solves this problem.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
| |
There are certain advantages to using uint8_t internally such as
well-defined arithmetic on all platforms. However, interfaces that
work in terms of raw data should use a void* type.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
| |
These helpers not only call blob_reserve_bytes but also make sure that
the blob is properly aligned as if blob_write_* were called.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Despite the name, it could only be used if you immediately wrote to the
pointer. Noboby was using it outside of one test, so clearly this
behavior wasn't that useful. Instead, make it return an offset into the
data buffer so that the result isn't invalidated if you later write to
the blob. In conjunction with blob_overwrite_bytes(), this will be
useful for leaving a placeholder and then filling it in later, which
we'll need to do for handling phi nodes when serializing NIR.
v2 (Jason Ekstrand):
- Detect overflow in the offset + to_write computation
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
| |
These can be used to easily count up the number of bytes that will be
required by "writing" it into the NULL blob.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
There's no reason why that tiny bit of memory needs to be on the heap.
We always put blob_reader on the stack, so why not do the same with the
writable blob.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
| |
We're going to want to use the blob for Vulkan pipeline caching so it
makes sense to have it in libcompiler not libglsl.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise we could have a failure followed by a smaller write that
succeeds and get a corrupted blob. If we ever OOM, we should stop.
v2 (Jason Ekstrand):
- Initialize the new boolean member in create_blob
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
| |
Otherwise, if you have a large read fail and then try to do a small
read, the small read may succeed even though it's at the wrong offset.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
| |
For TGSI-based drivers, st_glsl_to_tgsi records this information.
For NIR-based drivers, nir_shader_gather_info() will do so.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
I'd like to put this sort of metadata in the shader_info structure,
rather than adding more things to gl_program.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
With the ssao demo from Vulkan demos:
radv/rx480: 440->440fps
anv/haswell: 24->34 fps
The demo does a 0->32 loop across a ubo with 32 members.
Reviewed-by: Timothy Arceri <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I've been doing this inside of vc4, but vc5 wants it as well and it may be
useful for other drivers (Intel has a related path for pre-gen6 with MRT,
and freedreno had a TGSI path for it at one point).
This required defining a common enum for the standard comparison
functions, but other lowering passes are likely to also want that enum.
v2: Add to meson.build as well.
Acked-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Unlike uniforms, the limit on shared memory size is not called out
explicitly in the list of things that cause linker errors, but presumably
that's just an oversight in the spec.
Fixes dEQP-GLES31.functional.debug.negative_coverage.{callbacks,get_error,log}.compute.exceed_shared_memory_size_limit
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
c7affbf6875622a enabled GLSLOptimizeConservatively on some
drivers. The idea was to speed up compile times by running
the GLSL IR passes only once each time do_common_optimization()
is called. However loop unrolling can create a big mess and
with large loops can actually case compile times to increase
significantly due to a bunch of redundant if statements being
propagated to other IRs.
Here we make sure to clean things up before moving on.
There was no measureable difference in shader-db compile times,
but it makes compile times of some piglit tests go from a couple
of seconds to basically instant.
The shader-db results seemed positive also:
Totals:
SGPRS: 2829456 -> 2828376 (-0.04 %)
VGPRS: 1720793 -> 1721457 (0.04 %)
Spilled SGPRs: 7707 -> 7707 (0.00 %)
Spilled VGPRs: 33 -> 33 (0.00 %)
Private memory VGPRs: 3140 -> 2060 (-34.39 %)
Scratch size: 3308 -> 2180 (-34.10 %) dwords per thread
Code Size: 79441464 -> 79214616 (-0.29 %) bytes
LDS: 436 -> 436 (0.00 %) blocks
Max Waves: 558670 -> 558571 (-0.02 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The old code assumed that loop terminators will always be at
the start of the loop, resulting in otherwise unrollable
loops not being unrolled at all. For example the current
code would unroll:
int j = 0;
do {
if (j > 5)
break;
... do stuff ...
j++;
} while (j < 4);
But would fail to unroll the following as no iteration limit was
calculated because it failed to find the terminator:
int j = 0;
do {
... do stuff ...
j++;
} while (j < 4);
Also we would fail to unroll the following as we ended up
calculating the iteration limit as 6 rather than 4. The unroll
code then assumed we had 3 terminators rather the 2 as it
wasn't able to determine that "if (j > 5)" was redundant.
int j = 0;
do {
if (j > 5)
break;
... do stuff ...
if (bool(i))
break;
j++;
} while (j < 4);
This patch changes this pass to be more like the NIR unrolling pass.
With this change we handle loop terminators correctly and also
handle cases where the terminators have instructions in their
branches other than a break.
V2:
- fixed regression where loops with a break in else were never
unrolled in v1.
- fixed confusing/wrong naming of bools in complex unrolling.
Reviewed-by: Nicolai Hähnle <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
do-while loops can increment the starting value before the
condition is checked. e.g.
do {
ndx++;
} while (ndx < 3);
This commit changes the code to detect this and reduces the
iteration count by 1 if found.
V2: fix terminator spelling
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Elie Tournier <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These instructions will be executed on every iteration of the loop
we cannot drop them.
V2:
- move removal of unreachable terminators from the terminator list
to the same place they are removed from the IR as suggested by
Nicolai.
Reviewed-by: Nicolai Hähnle <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This gets pretty much the entire classic tree building, as well as
i965, including the various glapis. There are some workarounds for bugs
that are fixed in meson 0.43.0, which is due out on October 8th.
I have tested this with piglit using glx.
v2: - fix typo "vaule" -> "value"
- use gtest dep instead of linking to libgtest (rebase error)
- use gtest dep instead of linking against libgtest (rebase error)
- copy the megadriver, then create hard links from that, then delete
the megadriver. This matches the behavior of the autotools build.
(Eric A)
- Use host_machine instead of target_machine (Eric A)
- Put a comment in the right place (Eric A)
- Don't have two variables for the same information (Eric A)
- Put pre_args at top of file in this patch (Eric A)
- Fix glx generators in this patch instead of next (Eric A)
- Remove -DMESON hack (Eric A)
- add sha1_h to mesa in this patch (Eric A)
- Put generators in loops when possible to reduce code in
mapi/glapi/gen (Eric A)
v3: - put HAVE_X11_PLATFORM in this patch
Signed-off-by: Dylan Baker <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
This was missed in a rebase, and doesn't affect radv or anv, only i965.
Signed-off-by: Dylan Baker <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|