| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
Enforce instruction limit of 512 instructions earlier. This is a
workaround for infinite loops in gpir compiler and allows us to
pin point the tests that are affected.
Reviewed-by: Andreas Baierl <[email protected]>
Signed-off-by: Vasily Khoruzhick <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4055>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4055>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because branch conditions have to be in the pass slot, there is no
unconditional branch, and realistically the pass slot has to contain a
move when branching (there's nothing it does that would be useful for
operating on booleans, so we can't use it for anything when computing
the branch condition), we put the branch instruction in the pass slot
and at codegen time turn it into a move of the branch condition. This
means that it doesn't have to be special-cased like store instructions
are in the scheduler. Because of this decision we can remove the
half-implemented BRANCH codegen slot. Finally, we (ab)use the existing
schedule_first mechanism to make sure that branches are always last in
the basic block.
Reviewed-by: Vasily Khoruzhick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See https://gitlab.freedesktop.org/lima/mesa/issues/94 for the gory
details of why this is needed. For *_impl this is easy, since it never
increases register pressure and it goes in the complex slot hence it
never counts against max nodes. It's a bit more challenging for
complex2, since it does count against max nodes, so we need to change
the reservation logic to reserve an extra slot for complex2 when
scheduling complex1. This second part isn't strictly necessary yet, but
it will be for exp2.
Signed-off-by: Connor Abbott <[email protected]>
Acked-by: Qiang Yu <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When writing the scheduler, we forgot that you can't read the complex
unit in certain sources because it gets overwritten to 0 or 1. Fixing
this turned out to be possible without giving up and reducing
GPIR_VALUE_REG_NUM to 10, although it was difficult in a way I didn't
expect. There can be at most 4 next-max nodes that can't have moves
scheduled in the complex slot, so it actually isn't a problem for
getting the number of next-max nodes at 5 or lower. However, it is a
problem for stores. If a given node is a next-max node whose move cannot
go in the complex slot *and* is used by a store that we decide to
schedule, we have to reserve one of the non-complex slots for a move
instead of all the slots, or we can wind up in a situation where only
the complex slot is free and we fail the move. This means that we have
to add another term to the reservation logic, for stores whose children
cannot be in the complex slot.
Acked-by: Qiang Yu <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now, we do scheduling at the same time as value register allocation. The
ready list now acts similarly to the array of registers in
value_regalloc, keeping us from running out of slots. Before this, the
value register allocator wasn't aware of the scheduling constraints of
the actual machine, which meant that it sometimes chose the wrong false
dependencies to insert. Now, we assign value registers at the same time
as we actually schedule instructions, making its choices reflect reality
much better. It was also conservative in some cases where the new scheme
doesn't have to be. For example, in something like:
1 = ld_att
2 = ld_uni
3 = add 1, 2
It's possible that one of 1 and 2 can't be scheduled in the same
instruction as 3, meaning that a move needs to be inserted, so the value
register allocator needs to assume that this sequence requires two
registers. But when actually scheduling, we could discover that 1, 2,
and 3 can all be scheduled together, so that they only require one
register. The new scheduler speculatively inserts the instruction under
consideration, as well as all of its child load instructions, and then
counts the number of live value registers after all is said and done.
This lets us be more aggressive with scheduling when we're close to the
limit.
With the new scheduler, the kmscube vertex shader is now scheduled in 40
instructions, versus 66 before.
Acked-by: Qiang Yu <[email protected]>
|
|
|
|
| |
Reviewed-by: Qiang Yu <[email protected]>
|
|
|
|
|
|
| |
Fixes: 92d7ca4b1cd "gallium: add lima driver"
Signed-off-by: Qiang Yu <[email protected]>
Reviewed-by: Vasily Khoruzhick <[email protected]>
|
|
|
|
|
|
|
|
| |
Come from glmark2-es2 jellyfish test.
Fixes: 92d7ca4b1cd "gallium: add lima driver"
Signed-off-by: Qiang Yu <[email protected]>
Reviewed-by: Vasily Khoruzhick <[email protected]>
|
|
v2:
- use renamed util_dynarray_grow_cap
- use DEBUG_GET_ONCE_FLAGS_OPTION for debug flags
- remove DRM_FORMAT_MOD_ARM_AGTB_MODE0 usage
- compute min/max index in driver
v3:
- fix plbu framebuffer state calculation
- fix color_16pc assemble
- use nir_lower_all_source_mods for lowering neg/abs/sat
- use float arrary for static GPU data
- add disassemble comment for static shader code
- use drm_find_modifier
v4:
- use lima_nir_lower_uniform_to_scalar
v5:
- remove nir_opt_global_to_local when rebase
Cc: Rob Clark <[email protected]>
Cc: Alyssa Rosenzweig <[email protected]>
Acked-by: Eric Anholt <[email protected]>
Signed-off-by: Andreas Baierl <[email protected]>
Signed-off-by: Arno Messiaen <[email protected]>
Signed-off-by: Connor Abbott <[email protected]>
Signed-off-by: Erico Nunes <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
Signed-off-by: Koen Kooi <[email protected]>
Signed-off-by: Marek Vasut <[email protected]>
Signed-off-by: marmeladema <[email protected]>
Signed-off-by: Paweł Chmiel <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Signed-off-by: Rohan Garg <[email protected]>
Signed-off-by: Vasily Khoruzhick <[email protected]>
Signed-off-by: Qiang Yu <[email protected]>
|