| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Not a perfect solution, and the "pressure" target is hard-coded. But it
doesn't really seem to much in the common case, and avoids exploding
register usage in dEQP ssbo tests.
So this should serve as a stop-gap solution until I have time to re-
write the scheduler.
Hurts slightly in instruction count, but gains (reduces) slightly the
register usage in shader-db. Fixes ~150 dEQP-GLES31.functional.ssbo.*
that were failing due to RA fail.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Array index should come before sample-id. And exclude all isam variants
(which take integer texel coords) from adding of offset.
Fixes dEQP-GLES31.functional.texture.multisample.samples_1.use_texture_*_2d_array
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We also need to put in the output mov. Possibly we could just fixup the
output register to read it directly from the dummy, but that is more
work and I guess dEQP is probably the only time you encounter this.
Fixes dEQP-GLES31.functional.shaders.opaque_type_indexing.atomic_counter.const_literal_fragment
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Fixes dEQP-GLES31.functional.draw_base_vertex.draw_elements_base_vertex.builtin_variable.vertex_id
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We weren't propagating the array info for cases where result of atomic
is array/reg. This can happen, for example, if result is part of a phi
web lowered to regs.
Fixes dEQP-GLES31.functional.ssbo.atomic.compswap.*
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use the (nopN) encoding for slightly denser shaders.. this lets us fold
nop instructions into the previous alu instruction in certain cases.
Shouldn't change the # of cycles a shader takes to execute, but reduces
the size. (ex: glmark2 refract goes from 168 to 116 instructions)
Currently only enabled for a6xx, but I think we could enable this for
a5xx and possibly a4xx.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Fixes nearly all of dEQP-GLES31.functional.texture.border_clamp.* when
run after a test that binds textures used in vertex shader.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Fixes dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler.const_literal.vertex.samplercubeshadow
and few other similar tests that do multiple texture fetches into
individual components of a packet output. Mostly works around the
issue mentioned in ra_block_find_definers().
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Turns out we can write to tiled images as well as read. This avoids
having to linearize or do the tiling in the shader.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use the 'UNK31' bit (which should probably be called 'BUFFER') for
samplerBuffer case, which increases the size of supported buffer
texture beyond 2^15 elements.
Also need to fix the 2nd coord injected to handle the tex instructions
that take integer coords.
Fixes dEQP-GLES31.functional.texture.texture_buffer.render.as_fragment_texture.buffer_size_131071
and similar
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
... instead of isam. It seems like when using isam, plus atomics, we
can have the problem of old data being in the texture cache. Plus this
way we don't have to load a component at a time.
Note that blob still seems to use isam in some cases. I suppose it might
be preferable in the case of loading a single component, when atomics
are not in the picture (or that the ssbo does not need to otherwise be
coherent).
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Resync disasm and instr header from envytools, and add ldib encoding.
This replaces an opcode from a3xx which was never seen in practice,
since that seemed easier than dealing with the same opc # meaning a
different thing on a6xx. (Not really sure if 'sti' was actually a
real thing, I think it was only seen in fuzzing.)
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Silly copy/pasta bug, since load_image is actually the same instruction
but different barrier class.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
This was overlooked when it moved to ir3_context.c and ceased to be
static..
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Fixes
dEQP-GLES3.functional.shaders.indexing.varying_array.vec3_dynamic_write_dynamic_loop_read
regression.
Fixes: c1a27ba9baf freedreno/ir3: HIGH reg w/a for a6xx
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
The variant will be NULL if RA failed. Which isn't ideal, but at least
lets not segfault and bring down the rest of the dEQP run with us.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The wrmask is handled in regmask_get()/regmask_set(), but it wasn't
being propagated from SSA src to dst. So for example, an SSBO read
value that is passed in as src2.y component to atomic op, wasn't
getting the (sy) flag set. Causing lots of fail.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Add support for multisampled sources for the blitter.
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new encoding returns a value via the 2nd src. The legalize pass
needs to be aware of this to set the correct needs_sy flag, otherwise we
can, in cases where the atomic dst is not used, overwrite the register
that hardware will asynchronously load result into without (sy) flag, so
it gets clobbered by the atomic result.
This fixes a whole lot of rando ssbo+atomic fails, like
dEQP-GLES31.functional.ssbo.layout.single_basic_type.packed.highp_vec4.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It seems like some instructions (noticed this w/ cat3), cannot read HIGH
regs.. cat1 (mov/cov) can, and possibly some/all of cat2.
The blob seems to stick w/ an extra mov into low regs. So lets do the
same.
This fixes WGID on a6xx, which unsurprisingly is related to a lot of
deqp compute fails.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
For the handful of instructions that use a new encoding.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Images and SSBOs don't map directly to the hw. They end up being part
texture and part something else. Starting with a6xx, the hack used for
a5xx to smash the image tex state into hw texture state starting from
MAX counting down won't work, because we start using tex state also for
SSBO read.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Note that image/ssbo support is currently only implemented for a5xx.
But the instruction encoding is the same for a4xx.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
We probably need to rethink how we detect which instruction first
defines higher register classes. But for now, this at least fixes
the symptom.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
There was an issue recently caused by the system header being included
by mistake, so let's just get rid of this include path and always
explicitly #include "drm-uapi/FOO.h"
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
I think this will save an instruction and hopefully not increase any other
costs (possibly the immediate -1 and 1?), but I haven't actually tested.
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
| |
This pulls in changes for compute shaders and a6xx ssbo/image support.
FACENESS bit moved from position 1 to 2 and there's a global invert
bit for point coord.
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
| |
Update for a6xx.xml.h to incorporate a few new bits and changes to
blit src rect coordinate types.
Signed-off-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Fixes: b4476138d5a freedreno: move drm to common location
Reviewed-by: Eric Engestrom <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Jonathan Marek <[email protected]>
|
|
|
|
|
| |
Fixes: aa0fed10d35 ("freedreno: move ir3 to common location")
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
| |
used for CL kernels
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
Signed-off-by: Karol Herbst <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If nothing else, this will make problems with cmdstream getting blit
over with pixels easier to track down (ie. faults when it first happens
rather than strange failures later from corrupted cmdstream when a
stateobj is later reused).
(NOTE this somewhat depends on the kernel supporting the flag, and the
iommu implementation. But the worst case is just that the cmdstream
ends up writeable as before.)
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
This way they can be shared. Build tested with meson, but not too sure
on the autotools stuff though.
Reviewed-by: Dylan Baker <[email protected]>
Acked-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"pad" was missing in Mesa's msm_drm.h. sizeof(drm_msm_gem_info)
remains the same, but now the compiler initializes the field to
zero.
Buffer allocation results in EINVAL without this for me.
Cc: Rob Clark <[email protected]>
Cc: Kristian Høgsberg <[email protected]>
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the naming is a bit confusing no matter how you look at it. Within SPIR-V
"global" memory is memory accessible from all threads. glsl "global" memory
normally refers to shader thread private memory declared at global scope. As
we already use "shared" for memory shared across all thrads of a work group
the solution where everybody could be happy with is to rename "global" to
"private" and use "global" later for memory usually stored within system
accessible memory (be it VRAM or system RAM if keeping SVM in mind).
glsl "local" memory is memory only accessible within a function, while SPIR-V
"local" memory is memory accessible within the same workgroup.
v2: rename local to function as well
v3: rename vtn_variable_mode_local as well
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This was a hold-over from the early TGSI days, and mostly not needed
with NIR. This avoids burning an entire 4 consecutive scalar regs
for vec3 outputs, for example. Which fixes a few places that we were
doing worse that we should on register usage.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes the following crash that happened after d6110d4d
The problem happens if we first compile a "vanilla" shader with nothing
lowered in NIR, which perform the final lowering passes on so->shader->
nir (including nir_lower_locals_to_regs()), and then later we have
compile a shader with some lowering. The second time through we would
have already done nir_lower_locals_to_regs().
Arguably this was already a bug, just one we hadn't noticed yet.
Fixes: d6110d4d547 intel/compiler: move nir_lower_bool_to_int32 before nir_lower_locals_to_regs
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
An earlier patch that introduced the function failed to handle the case
where an image format layout qualifier is not specified, which is allowed
on desktop GL profiles. In these cases, nir_variable's image format is
GL_NONE, and we don't need to print a debug message for those.
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
emit_intrinsic_store_image() is always using 4 components when
collecting registers for the value. When image has less than
4 components (e.g, r32f, rg32i, etc) this results in extra mov
instructions.
This patch uses the actual number of components from the image format.
For example, in a shader like:
layout (r32f, binding=0) writeonly uniform imageBuffer u_image;
...
void main(void) {
...
imageStore (u_image, some_offset, vec4(1.0));
...
}
instruction count is reduced in at least 3 instructions (note image
format is r32f, 1 component only).
This obviously reduces register pressure as well.
v2: - Added support for image formats from NV_image_format extension
(Ilia Mirkin).
- Return 4 components by default instead of asserting. (Rob Clark).
v3: Added more missing formats (Ilia Mirkin).
v4: Added a debug message for unknown image formats (Rob Clark).
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On some GPUs, especially older Intel GPUs, some math instructions are
very expensive. On those architectures, don't reduce flow control to a
csel if one of the branches contains one of these expensive math
instructions.
This prevents a bunch of cycle count regressions on pre-Gen6 platforms
with a later patch (intel/compiler: More peephole select for pre-Gen6).
v2: Remove stray #if block. Noticed by Thomas.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Thomas Helland <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
That flow control may be trying to avoid invalid loads. On at least
some platforms, those loads can also be expensive.
No shader-db changes on any Intel platform (even with the later patch
"intel/compiler: More peephole select").
v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested
by Rob. See also the big comment in src/intel/compiler/brw_nir.c.
v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from
nir_lower_io_arrays_to_elements.c).
v4: Fix inverted condition in brw_nir.c. Noticed by Lionel.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
We also enable it in all of the NIR drivers.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Tested-by: Bas Nieuwenhuizen <[email protected]>
|