| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
v2: Fix comparing addresses from formats that have more than one
component by using nir_ball_iequal(). (Jason)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
Similar to nir_bany_inequal(). Suggested by Jason.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Program parser allocates parameter list.
In case of parsing error some variables will not be freed.
Patch adds freeing of it.
Signed-off-by: Sergii Romantsov <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
v2: defining 0x100 and use this for setting the FS_OUTPUT_REG.HALF_PRECISION
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the loop for assigning registers was bailing out early if
the register had a null source. I think the intention is that in this
case it isn’t necessary to assign a register. However it was also
missing out the part to fix up the types. This can happen if the
instruction is copy propagated to be a move from a constant half-float
input register. In that case it still needs to fix up the types.
Fixes assert in
dEQP-GLES3.functional.shaders.invariance.highp.subexpression_precision_mediump
when lowering the precision of the variables.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Though it should be fixed in RA pass, it needs to be set correctly from
the beginning according to the bitsize of NIR dest.
v2: Would be better for mad,fddx,fddy to fixup later in RA pass.
[small cleanup of fallout from imov/fmov removal fallout]
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It seems to handle only 32-bit values for half constant registers
within floating point opcodes according to the blob driver.
So we need to convert back to 32-bit values from 16-bit values, when a
lower precision pass is in effect.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
If the type of dest reg and src reg of absneg opcode are different,
it shouldn't be considered as same type mov.
This patch becomes meaningful when we start to use mediump information for
doing precision lowering to 16bit.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
dest.
eg. uniform mediump vec4 f;
This patch means nothing since there's no mediump lowering pass for now,
but will be meaningful when the pass land in the near future.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the A5XX_SP_FS_OUTPUT_REG_HALF_PRECISION was set depending
on whether half_precision was set in the shader key. With support for
mediump precision, it is possible to have different outputs use
different precisions. That means we can’t have a global shader state
to specify it. Instead it now tries to copy the half-float-ness
from the nir_variable for the output into the ir3_shader_variant. This
is then used to decide whether to set half-precision for each output.
The a6xx version is copied from the a5xx code but it has not been
tested.
v2. [Hyunjun Ko ([email protected])] There's the half flag recently
added, which represents precision based on IR3_REG_HALF. Now use this
flag to avoid duplication.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Previously the code to load from a constant instruction was always
using the u32 pointer. If the constant is actually a 16-bit source
this would end up with the wrong values because the pointer would be
offset by the wrong size. This fixes it to use the u16 pointer.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
The aren't real instructions, and don't change # of live values, so no
point in them competing with real instructions.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For instructions that increase the # of live values, apply a threshold
to avoid scheduling them too early. And factor the net change of # of
live values that would result from scheduling an instruction, to
prioritize instructions that reduce number of live values as the number
of live values increases.
For manhattan:
total instructions in shared programs: 27869 -> 28413 (1.95%)
instructions in affected programs: 26756 -> 27300 (2.03%)
helped: 102
HURT: 87
total full in shared programs: 1903 -> 1719 (-9.67%)
full in affected programs: 1390 -> 1206 (-13.24%)
helped: 124
HURT: 9
The reduction in register usage nets ~20% gain in manhattan. (So
getting mediump support should be a huge win for gles gfxbench.)
Also significantly helps some of the more complex shadertoy shaders,
like IQ's Piano (32 to 18 regs, doubles fps).
The effect is less pronounced on smaller shaders.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
Account for shader outputs and values live in any direct/indirect
successor block.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Allows the legacy matrix stacks to be manipulated without disturbing the
matrix mode selector.
Adapted from a patch from Chris Forbes.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Split this out from glMatrixMode since we're about to need it
independently for EXT_DSA.
Adapted from Chris Forbes commit.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
This extension is huge and this gives us a TODO list of functions
to implement.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit a1378639ab19 reordered context functions initializations but broke
sctx->b.resource_copy_region init when using AMD_DEBUG=forcedma.
In this case sctx->dma_copy was assigned a value after being used in:
sctx->b.resource_copy_region = sctx->dma_copy;
This commit moves the FORCE_DMA special case after sctx->dma_copy initialization.
See https://bugs.freedesktop.org/show_bug.cgi?id=110422
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Recently PIPE_CAP_MAX_FRAMES_IN_FLIGHT was changed from 2
to 1:
20909284f204091757c050aa40cfffaf3f981b9c
No driver seems to overwrite the default value.
One user reports severe regressions for some games.
For now, revert to the value 2 for nine.
Cc: "19.1" [email protected]
Signed-off-by: Axel Davy <[email protected]>
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
Bugzilla: https://bugzilla.freedesktop.org/show_bug.cgi?id=109754
Cc: 19.0 19.1 <[email protected]>
Tested-by: Pierre-Eric Pelloux-Prayer <[email protected]>
|
|
|
|
|
|
|
| |
Now that we have the util function for the default values, we can get
rid of the boilerplate.
Signed-off-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
From the Vulkan spec 1.1.108:
"vkCmdCopyQueryPoolResults is guaranteed to see the effect of
previous uses of vkCmdResetQueryPool in the same queue, without any
additional synchronization."
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Fixes: c1275052 "nir: add type information to load uniform/input and store output intrinsics"
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Erico Nunes <[email protected]>
Tested-by: Erico Nunes <[email protected]>
Tested-by: Andreas Baierl <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This changes requires LLVM r356755.
32706 shaders in 16744 tests
Totals:
SGPRS: 1448848 -> 1455984 (0.49 %)
VGPRS: 1016684 -> 1016220 (-0.05 %)
Spilled SGPRs: 25871 -> 25815 (-0.22 %)
Spilled VGPRs: 122 -> 122 (0.00 %)
Scratch size: 11964 -> 11956 (-0.07 %) dwords per thread
Code Size: 55324500 -> 55301152 (-0.04 %) bytes
Max Waves: 235660 -> 235586 (-0.03 %)
Totals from affected shaders:
SGPRS: 293704 -> 300840 (2.43 %)
VGPRS: 246716 -> 246252 (-0.19 %)
Spilled SGPRs: 159 -> 103 (-35.22 %)
Scratch size: 188 -> 180 (-4.26 %) dwords per thread
Code Size: 8653664 -> 8630316 (-0.27 %) bytes
Max Waves: 60811 -> 60737 (-0.12 %)
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Now that the type gathering function look at instructions that might
have other types, return invalid type instead of crashing. That
invalid will be properly ignored later.
Fixes: c12750527b7 "nir: add type information to load uniform/input and store output intrinsics"
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Removes the bool_to_float logic from the int_to_float pass, so that both
can be used separately. By having separate passes we have better validation
and it makes it possible to use with the lower_ftrunc option (int lowering
generates ftrunc, but lower_ftrunc generates bools, ftrunc lowering should
probably be reworked). For now we always expect lower_bool to come after
lower_int.
Also fixes f2i32 to become ftrunc and adds u2f/f2u cases.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
It is treated like the vecN instructions which also have no type.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Add a "lower_bitshift" option, which disables optimizations introducing
bitshifts and lowers ishl by constant to a multiply, so that we don't have
to deal with bitshifts in int_to_float lowering.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consts and undefs can be used as different types (common with "0" constant)
so don't copy types from consts/undefs, only to them. It doesn't entirely
solve the problem that the type given to the const could be wrong , but
now the only realistic case is with "0" which is the same when casted to
float, so it doesn't matter for lower_int_to_float.
The other change is to get type information for load input/uniform and
store output, and use that to get correct results.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
This type information will be used by gather_ssa_types to get usable results
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Improvements related to the patch that removed native_integers:
* In glsl_to_nir, special cases for i2f,u2f,etc are no longer needed
* In prog_to_nir, use sge/slt and let lower_scmp lower it if needed
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We could have identical texture state for both VS and FS.. which would
result in VS state getting created first, and FS state mapping to the
identical cmdstream. Resulting in VS state getting emitted twice and no
FS state emitted.
Fixes:
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.basic_array.sampler2D_both
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.struct_in_array.sampler2D_samplerCube_both
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.array_in_struct.sampler2D_samplerCube_both
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_both
dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.nested_structs_arrays.sampler2D_samplerCube_both
dEQP-GLES31.functional.program_uniform.by_pointer.render.array_in_struct.sampler2D_samplerCube_both
dEQP-GLES31.functional.program_uniform.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_both
dEQP-GLES31.functional.program_uniform.by_value.render.nested_structs_arrays.sampler2D_samplerCube_both
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we access the address of the UBO indirectly, and there is no higher
const emitted w/ direct access (like an immediate lowered to uniform)
the assembler won't figure out the correct constlen.
Fixes:
dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_vertex
dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_fragment
dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_vertex
dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_fragment
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Fixes dEQP-GLES2.functional.multisampled_render_to_texture.readpixels
Signed-off-by: Rob Clark <[email protected]>
Acked-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Blob is also setting the .l bit, and it seems to solve some intermittent
failures with a couple of deqp's:
dEQP-GLES31.functional.image_load_store.2d.qualifiers.coherent_r32i
dEQP-GLES31.functional.image_load_store.2d.qualifiers.volatile_r32f
Signed-off-by: Rob Clark <[email protected]>
Acked-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It seems like (ei) handling doesn't sync on (ss), so we could end up in
a situation where we release varying storage before an ldlv for flat
shaded varyings completes. Keep track if we've done an (ss) since the
last ldlv, and if not add (ss) flag to last_input which gets (ei).
Noticed with dEQP-GLES3.functional.fragment_out.random.24 and
dEQP-GLES3.functional.fragment_out.random.27, which previously passed by
luck because ir3_sched ordered instructions in a way that resulted in a
lucky (ss).
Signed-off-by: Rob Clark <[email protected]>
Acked-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The special handling for last_input assumes that all the varying loads
are in the first block. Add an assert to catch if anyone breaks that
assumption.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
While we're here, copy the size table from set.c to get rid of hard tabs
in the hash_table.c version.
Reviewed-by: Eric Anholt <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Compilation times with my shader-db database:
Difference at 95.0% confidence
-1.22312 +/- 0.726033
-0.283979% +/- 0.168254%
(Student's t, pooled s = 1.02177)
Reviewed-by: Eric Anholt <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This should be at least as fast as using fast_idiv_by_const, and has the
advantage that the precomputation is simple enough to be evaluated at
Mesa-compile time for hash tables and sets which have a fixed table of
possible divisors.
Acked-by: Eric Anholt <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
To keep it in sync with the set implementation.
Reviewed-by: Eric Anholt <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A significant portion of the time spent in nir_opt_cse for the Dolphin
ubershaders was in resizing the set. When resizing a hash table, we know
in advance that each new element to be inserted will be different from
every other element, so we don't have to compare them, and there will be
no tombstone elements, so we don't have to worry about caching the
first-seen tombstone. We add a specialized add function which skips
these steps entirely, speeding up resizing.
Compile-time results from my shader-db database:
Difference at 95.0% confidence
-2.29143 +/- 0.845534
-0.529475% +/- 0.194767%
(Student's t, pooled s = 1.08807)
Reviewed-by: Eric Anholt <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
To keep the set and hash table in sync. Note that some of this had
already been done for hash tables, in particular pulling out the
hash % ht->size computation.
Reviewed-by: Eric Anholt <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unfortunately GCC can't do this for us, probably because we call the key
comparison function which GCC can't prove won't modify arbitrary memory.
This is a pretty hot function, so do the optimization manually to be
sure the compiler will get it right.
While we're here, make the computation of the new probe address use a
single conditional subtract instead of a modulo, since we know that it
won't ever get as big as 2 * ht->size before the modulo. Modulos tend to
be pretty expensive operations.
shader-db compile time results for my database:
Difference at 95.0% confidence
-2.24934 +/- 0.69897
-0.516296% +/- 0.159993%
(Student's t, pooled s = 0.983684)
Reviewed-by: Eric Anholt <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
|