| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move the bits we want to share between generations from fd3_program to
ir3_shader. So overall structure is:
fdN_shader_stateobj -> ir3_shader -> ir3_shader_variant -> ir3
|- ...
\- ir3_shader_variant -> ir3
So the ir3_shader becomes the topmost generation neutral object, which
manages the set of variants each of which generates, compiles, and
assembles it's own ir.
There is a bit of additional renaming to s/fd3_compiler/ir3_compiler/,
etc.
Keep the split between the gallium level stateobj and the shader helper
object because it might be a good idea to pre-compute some generation
specific register values (ie. anything that is independent of linking).
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
First step of reoganization split out compiler (so it can be shared
between a3xx and a4xx). Rename ir3_shader -> ir3 (since we'll want
the name ir3_shader for a higher level object).
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
The scheduler also needs to be aware of predicate register (p0) in
addition to address register (a0).
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Remove some obsolete comments, rename deref->addr.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
It seems like for the most part, different behaviors, workarounds, etc,
should be conditional on GPU patch revision (ie. a320.0 vs a320.2)
rather than GPU id (a320 vs a330).
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The fixed size heap is a remnant of the fdre-a3xx assembler. Yet it is
convenient for being able to free the entire data structure in one shot
without worrying about leaking nodes.
Change it to dynamically grow the heap size (adding chunks) as needed so
we don't have an artificial upper limit on shader size (other than hw
limits) and don't always have to allocate worst-case size.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
The iteration variables go from 0 anyway.
Signed-off-by: Jan Vesely <[email protected]>
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
| |
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
| |
This will be used in the following patch to avoid duplicated code
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
| |
v2: Remove unnecesary variables
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
| |
Can we please keep it clean and avoid ending up in messy situation
like ddx.
Signed-off-by: Jérôme Glisse <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a situation where double-register values are used, the phi nodes can
still end up being u32 values. They all get merged into one RA node
though. When fixing up the merge (which comes after the phi node), the
phi node's def would get fixed, but not its sources which would remain
at the low register value.
This maintains the invariant that a phi node's defs and sources are
allocated the same register.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: <[email protected]>
|
|
|
|
| |
Just to be cautious.
|
|
|
|
|
| |
s/alloc_bo/rename_bo/ as that is what the functions do. Simplify bo
allocation and move the complexity to bo renaming.
|
|
|
|
|
| |
Add resource_get_bo_name() and resource_get_bo_initial_domain() for use by
both functions.
|
|
|
|
| |
GEN7.5 gains support for those formats natively.
|
|
|
|
| |
Pass ilo_dev_info to all format translation functions.
|
|
|
|
|
|
|
| |
Don't assert (debug builds) or assign random uninitialized value for
predicate register (p0).. that screws up kill, etc.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
This reverts commit 467f1585e28adba0e94ef593de131bc327f098bb.
This breaks the build on some systems.
|
|
|
|
|
|
|
| |
Use K&R and same indent as most other code. No functional change
intended.
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Accuracy of some operations was recently improved in the R600 backend,
at the cost of slower code. This is required for compute shaders,
but not for graphics shaders. Add unsafe-fp-math hint to make LLVM
generate faster but possibly less accurate code.
Piglit didn't indicate any regressions.
Reviewed-by: Tom Stellard <[email protected]>
|
| |
|
|
|
|
|
|
|
| |
Piglit quick tests including sqrt pass, no other regressions,
tested on radeon 6670.
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we know that the pool is defragmented, we positively know
that allocated + unallocated will be the total size of the
current pool plus all the items that will be promoted. So we only
need to grow the pool once.
This will allow us to just add the new items to the end of the
item_list without the need of looking for a place to the new item.
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
| |
This way we can avoid defragmenting the pool, even if it is needed
to defragment it, and looping again through the list of unallocated
items.
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a new member to the pool to track its status.
For now it is used only for the 'fragmented' status, but if
needed it could be used for more statuses.
The pool will be considered fragmented if: An item that isn't
the last is freed or demoted.
This 'strategy' has a problem, although it shouldn't cause any bug.
If for example we have two items, A and B. We choose to free A first,
now the pool will have the 'fragmented' status. If we now free B,
the pool will retain its 'fragmented' status even if it isn't
fragmented.
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This new function will move items forward in the pool, so that
there's no gap between them, effectively defragmenting the pool.
For now this function is a bit dumb as it just moves items
forward without trying to see if other items in the pool could
fit in the gaps.
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This function will be used in the future by compute_memory_defrag
to move items forward in the pool.
It does so by first checking for overlaping ranges, if the ranges
don't overlap it will copy the contents directly. If they overlap
it will try first to make a temporary buffer, if this buffer fails
to allocate, it will finally fall back to a mapping.
Note that it will only be needed to move items forward, it only
checks for overlapping ranges in that case. If needed, it can
easily be added by changing the first if.
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
| |
Actually what we currently handle is just the SCALED versions, and not
the int versions. The difference probably matters more when we actually
support integer in the compiler.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Teach new compiler scheduling and register assignment how to deal with
relative addressing. This gets us what we need to avoid falling back to
old compiler for CONST[ADDR[0].x+n]. It is also a prerequisite for temp
file relative addressing, although that is going to also need some
cleverness in register assignment to keep arrays grouped together.
NOTE: doing address calculation in full precision and then narrowing to
s16 in the mov to addr reg seems to sometimes cause lockups (and
sometimes work?!). It seems more reliable to do the address calculation
in s16, like the blob does. Which means teaching RA how to deal with
mixed half and full precision allocation. Fortunately that didn't turn
out to be too hard, so that is a nice bonus which we could probably take
better advantage of elsewhere.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Technically we should not need these. CP_LOAD_STATE can be pipelined.
But removing them broke a few piglit tests, like fbo-depth-
GL_DEPTH_COMPONENT24-readpixels. I expect these are just masking a
problem elsewhere, or perhaps they are only needed under some more
specific circumstances. But until that is understood properly, give
back a bit of the perf boost we got from c63450e8.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
The kernel driver name is either "kgsl" (downstream/android) or "msm"
(upstream).
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Should reduce overhead because the caching buffer manager doesn't need to
consider buffers of the wrong type.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
The scratch buffer will be used for private memory and also register
spilling.
v2:
- Code cleanups
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
The is used for programs that have arrays of constants that
are accessed using dynamic indices. The shader will compute
the base address of the constants and then access them using
SMRD instructions.
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|