| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Ryan Houdek <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This prepass, run after scheduling but before RA, specializes to
pipeline registers where possible. It walks the IR, checking whether
sources are ever used outside of the immediate bundle in which they are
written. If they are not, they are rewritten to a pipeline register (r24
or r25), valid only within the bundle itself. This has theoretical
benefits for power consumption and register pressure (and performance by
extension). While this is tested to work, it's not clear how much of a
win it really is, especially without an out-of-order scheduler (yet!).
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Ryan Houdek <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Ryan Houdek <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
First, this moves the scheduler and emitter out of midgard_compile.c
into their own dedicated files.
More interestingly, this slims down midgard_bundle to be essentially an
array of _pointers_ to midgard_instructions (plus some bundling
metadata), rather than the instructions and packing themselves. The
difference is critical, as it means that (within reason, i.e. as long as
it doesn't affect the schedule) midgard_instrucitons can now be modified
_after_ scheduling while having changes updated in the final binary.
On a more philosophical level, this removes an IR. Previously, the IR
before scheduling (MIR) was separate from the IR after scheduling
(post-schedule MIR), requiring a separate set of utilities to traverse,
using different idioms. There was no good reason for this, and it
restricts our flexibility with the RA. So unify all the things!
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Ryan Houdek <[email protected]>
|
|
|
|
|
|
|
| |
Trivial.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Ryan Houdek <[email protected]>
|
|
|
|
|
|
|
| |
These are more generally useful than the files they were constrained to.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Ryan Houdek <[email protected]>
|
|
|
|
|
|
|
|
| |
Mostly, this fixes a number of instances of lines >> 80 chars,
refactoring them into something legible.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Ryan Houdek <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This represents a major break with the former RA design. We now use
conflicting register classes to represent the subdivision of Midgard's
128-bit registers into varying sizes and arrangement. We determine class
based on the number of components in the instructions' masks. To support
this, we include a number of helpers in the RA to allow composing
swizzles and masks, such that MIR written implicitly assuming .xyzw
sources can be transformed to use actual (non-aligned) sources.
The net result is a marked decrease in register pressure on
non-vec4-exclusive shaders. We could still be doing much better. Not
implemented yet are:
- Register spilling
- Per-component liveness
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Ryan Houdek <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
These masks distinguish scalar/vec2/vec3 loads from the default vec4,
which helps with assembly readability (since it's immediately obvious
how many components are _actually_ affected, rather than doing
mysterious things to an unknown number of unused components). Later in
the series, this will enable smarter register allocation, as the unused
components will not be interpreted abnormally.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Ryan Houdek <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This fixes liveness analysis with respect to inline constants and
branching. in practice, the symptom is abnormally high register
pressure.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Ryan Houdek <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
These snippets of integer assembly are injected for various purposes.
Eventually, we'll want to implement these in NIR directly. Regardless,
the "default" output modifier is different between floats and ints, so
let's set the right one.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Ryan Houdek <[email protected]>
|
|
|
|
|
|
|
|
| |
These were static to midgard_compile.c but are more generally useful
across the compiler.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Ryan Houdek <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This mechanism is only used by blend shaders, so just use a move here.
Ideally, it'll be copy-propped and DCE'd away; this removes a source of
considerable indirection and will simplify RA logic.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Ryan Houdek <[email protected]>
|
|
|
|
|
|
| |
Bindless handles in GL are 64-bit. This fixes an assert failure in LLVM.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Clear w/ quad uses a normal draw which adds up to OQ. st/meta
uses set_active_query_state(..) to tell the driver to pause
queries in such cases.
Fixes spec@arb_occlusion_query@occlusion_query_meta_save piglit.
Signed-off-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Mesa measures in DWords. The hardware also claims to measure in DWords.
Except the SO_WRITE_OFFSET field is actually bits 31:2, with 1:0 MBZ.
Which means that it really measures in bytes. So, convert to bytes.
Without this, our offset / stride denominator was 1/4th the size it
should be, leading to 4x the vertex count that we should have had.
Fixes GTF-GL46.gtf40.GL3Tests.transform_feedback2.transform_feedback2_two_buffers
|
| |
|
|
|
|
|
|
|
| |
The automatic header generation unifies identical registers in a series
and only emits definitions for the first one. This is mostly to avoid
emitting excessive definitions for CB registers, but special-casing
an exception for this family of registers doesn't seem worth it.
|
|
|
|
|
|
|
|
|
|
|
| |
The definition of the fields differs, but PITCH_GFX9 is a mere extension
of PITCH_GFX6 that does not conflict with any other fields.
This aligns the definitions with what will be generated from the
register JSON.
The information about how large the fields really are is preserved in
the register database.
|
|
|
|
|
|
|
|
|
|
| |
The field layout wasn't actually changed in gfx9, so having the suffix
isn't very useful. The field *contents* were changed, but this is
reflected in the V_xxx_xxx definitions and is taken into account by
the ac_debug logic based on the register JSON.
This aligns the definitions with what will be generated from the
register JSON.
|
|
|
|
|
|
|
|
| |
Don't have a separate mechanism for NIR constants to be removed from
the table. If unused, we will compact it away. The use_null_surface
is needed when INTEL_DISABLE_COMPACT_BINDING_TABLE is set.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change the iris_binding_table to keep track of what surfaces are
actually going to be used, then assign binding table indices just for
those. Reducing unused bytes on those are valuable because we use a
reduced space for those tables in Iris.
The rest of the driver can go from "group indices" (i.e. UBO #2) to
BTI and vice-versa using helper functions. The value
IRIS_SURFACE_NOT_USED is returned to indicate a certain group index is
not used or a certain BTI is not valid.
The environment variable INTEL_DISABLE_COMPACT_BINDING_TABLE can be
set to skip compacting binding table.
v2: (all from Ken)
Use BITFIELD64_MASK helper. Improve comments.
Assert all group is marked as used when we have indirects.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This will make convenient to handle compacting and printing the
binding table.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Stop using brw_compiler to lower the final binding table indices for
surface access. This is done by simply not setting the
'prog_data->binding_table.*_start' fields. Then make the driver
perform this lowering.
This is a better place to perfom the binding table assignments, since
the driver has more information and will also later consume those
assignments to upload resources.
This also prepares us for two changes: use ibc without having to
implement binding table logic there; and remove unused entries from
the binding table.
Since the `block` field in brw_ubo_range now refers to the final
binding table index, we need to adjust it before using to index
shs->constbuf.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We'll change iris to perform lowering of the binding table indices
earlier (before the backend kick in), but the backend compiler uses
the result of the analysis to identify load_ubo intrinsics, so we do
the analysis after the lowering to have the right indices.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
v2: defining 0x100 and use this for setting the FS_OUTPUT_REG.HALF_PRECISION
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the A5XX_SP_FS_OUTPUT_REG_HALF_PRECISION was set depending
on whether half_precision was set in the shader key. With support for
mediump precision, it is possible to have different outputs use
different precisions. That means we can’t have a global shader state
to specify it. Instead it now tries to copy the half-float-ness
from the nir_variable for the output into the ir3_shader_variant. This
is then used to decide whether to set half-precision for each output.
The a6xx version is copied from the a5xx code but it has not been
tested.
v2. [Hyunjun Ko ([email protected])] There's the half flag recently
added, which represents precision based on IR3_REG_HALF. Now use this
flag to avoid duplication.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit a1378639ab19 reordered context functions initializations but broke
sctx->b.resource_copy_region init when using AMD_DEBUG=forcedma.
In this case sctx->dma_copy was assigned a value after being used in:
sctx->b.resource_copy_region = sctx->dma_copy;
This commit moves the FORCE_DMA special case after sctx->dma_copy initialization.
See https://bugs.freedesktop.org/show_bug.cgi?id=110422
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
Now that we have the util function for the default values, we can get
rid of the boilerplate.
Signed-off-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Fixes: c1275052 "nir: add type information to load uniform/input and store output intrinsics"
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Erico Nunes <[email protected]>
Tested-by: Erico Nunes <[email protected]>
Tested-by: Andreas Baierl <[email protected]>
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Removes the bool_to_float logic from the int_to_float pass, so that both
can be used separately. By having separate passes we have better validation
and it makes it possible to use with the lower_ftrunc option (int lowering
generates ftrunc, but lower_ftrunc generates bools, ftrunc lowering should
probably be reworked). For now we always expect lower_bool to come after
lower_int.
Also fixes f2i32 to become ftrunc and adds u2f/f2u cases.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Add a "lower_bitshift" option, which disables optimizations introducing
bitshifts and lowers ishl by constant to a multiply, so that we don't have
to deal with bitshifts in int_to_float lowering.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We could have identical texture state for both VS and FS.. which would
result in VS state getting created first, and FS state mapping to the
identical cmdstream. Resulting in VS state getting emitted twice and no
FS state emitted.
Fixes:
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.basic_array.sampler2D_both
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.struct_in_array.sampler2D_samplerCube_both
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.array_in_struct.sampler2D_samplerCube_both
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_both
dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.nested_structs_arrays.sampler2D_samplerCube_both
dEQP-GLES31.functional.program_uniform.by_pointer.render.array_in_struct.sampler2D_samplerCube_both
dEQP-GLES31.functional.program_uniform.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_both
dEQP-GLES31.functional.program_uniform.by_value.render.nested_structs_arrays.sampler2D_samplerCube_both
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
Fixes dEQP-GLES2.functional.multisampled_render_to_texture.readpixels
Signed-off-by: Rob Clark <[email protected]>
Acked-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
And instead, link them as they are added.
Makes things a bit clearer and prepares future work such as FB reload
jobs.
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Tomeu Vizoso <[email protected]>
Acked-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
A bunch of tests have been fixed, but some regressions have appeared on
T760.
Signed-off-by: Tomeu Vizoso <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We now bounds check properly in the uniform loading fast path, so
there's no need to disable it by pretending there are other UBO bindings
in use. The way this looks at the variable name was causing problems
when two piglit shaders, one with a name that triggered the hack and one
that didn't, got hashed to the same thing after stripping out the names.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
| |
location is the API-level location, but driver_location is the actual
location the uniform gets passed to the driver. This apparently only
caused failures with builtins, where the location is 0 because it's
represented via the state tokens instead.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
ac expands the store to 32-bit components for us, but we still have to
deal with storing up to 8 components, and when a varying is split across
two vec4 slots we have to calculate the address again for the second
slot, since they aren't adjacent in memory. I didn't do this on the ac
level because we should generate better indexing arithmetic for the lds
store, where slots are contiguous.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
| |
Signed-off-by: Christian Gmeiner <[email protected]>
|
|
|
|
| |
Signed-off-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
|
| |
No significant changes in the code needed to enable
the extension. Just updating SWR capabilities
and the documentation
Reviewed-by: Alok Hota <[email protected]>
|
|
|
|
|
|
|
| |
Removing unused but problematic code from simdlib header to fix
compilation problem on 32-bit Linux.
Reviewed-by: Alok Hota <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We only need the lock for:
1. Rummaging through the cache
2. Allocating VMA
We don't need it for alloc_fresh_bo(), which does GEM_CREATE, and also
SET_DOMAIN to allocate the underlying pages. The idea behind calling
SET_DOMAIN was to avoid a lock in the kernel while allocating pages,
now we avoid our own global lock as well.
We do have to re-lock around VMA. Hopefully this shouldn't happen too
much in practice because we'll find a cached BO in the right memzone
and not have to reallocate it.
Reviewed-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
|
| |
Chris pointed out that the order between SET_DOMAIN and SET_TILING
doesn't matter, so we can just do the page allocation when creating
a new BO. Simplifies the flow a bit.
Reviewed-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Mathias Fröhlich reported that commit 6244da8e23e5470d067680 crashes.
list_for_each_entry_safe is safe against removing the current entry,
but iris_bo_cache_purge_bucket was potentially removing next entries
too, which broke our saved next pointer.
To fix this, don't bother with the iris_bo_cache_purge_bucket step.
We just detected a single entry where the kernel has purged the BO's
memory, and so it isn't a usable entry for our cache. We're about to
continue the search with the next BO. If that one's purged, we'll
clean it up too. And so on.
We may miss cleaning up purged BOs that are further down the list
after non-purged BOs...but that's probably fine. We still have the
time-based cleaner (cleanup_bo_cache) which will take care of them
eventually, and the kernel's already freed their memory, so it's not
that harmful to have a few kicking around a little longer.
Fixes: 6244da8e23e iris: Dig through the cache to find a BO in the right memzone
Reviewed-by: Chris Wilson <[email protected]>
|