| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Cc: 17.2 17.3 <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
This is just some cleanups on top of the last patch from my compute branch.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
Use the destination write mask to determine which values are really to be
read from LDS and load only these.
Reviewed-by: Dave Airlie <[email protected]>
Signed-off-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes hangs on cayman with
tests/spec/arb_tessellation_shader/execution/trivial-tess-gs_no-gs-inputs.shader_test
This has a single if/else in it, and when this peephole activated,
it would set the jump target to NULL if there was no instruction
after the final POP. This adds a NOP if we get a jump in this case,
and seems to fix the hangs, so we have a valid target for the ELSE
instruction to go to, instead of 0 (which causes infinite loops).
v2: update last_cf correctly. (I had some other patches hide this)
Cc: <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
Build tested only.
Signed-off-by: Dylan Baker <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
Build tested only.
Signed-off-by: Dylan Baker <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v4: - Ensure inc_amd_common defined when radeonsi is disabled (needed by
r600)
Signed-off-by: Dylan Baker <[email protected]>
Tested-by: Aaron Watry <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
This is build tested only
Signed-off-by: Dylan Baker <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
Build tested only.
Signed-off-by: Dylan Baker <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
where the other format-related functions live.
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
| |
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
| |
(other uses of USE_VC4_SIMULATOR are already correct)
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Just as an added precaution.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Fix a bunch of labels indicating when registers were added/removed
and normalize the SI-class GRBM_GFX_INDEX.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The next commit will reduce the size even more.
v2: typecast to uint64_t manually
v3: add more typecasts, add asserts
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
r600_texture: 1736 -> 1488 bytes
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
| |
r600_resource is malloc'd.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103808
Fixes: 4b0dc098b256 ("gallium/u_threaded: don't map big VRAM buffers for the first upload directly")
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
ported from Vulkan
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
This just makes it easier to debug some things.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This comes in handy when checking "NV50_PROG_DEBUG=1" outputs with diff!
V2:
- Use environmental variable (Karol Herbst)
V3:
- Use the already populated nv50_ir_prog_info to forward information to the
print pass (Pierre Moreau)
V4:
- get rid of default value in PrintPass constructor
Signed-off-by: Tobias Klausmann <[email protected]>
Reviewed-by: Pierre Moreau <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
== 0
Under certain conditions, waiting on a GL sync objects should act like
a flush, regardless of the timeout.
Portal 2, CS:GO, and presumably other Source engine games rely on this
behavior and hang during loading without this fix.
Fixes: bc65dcab3bc4 ("radeonsi: avoid syncing the driver thread in si_fence_finish")
Signed-off-by: Marek Olšák <[email protected]>
Tested-by: Kai Wasserbäch <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103902
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103904
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Memory loads can take offsets, but the SHLADD will often attempt to
consume the offsets too. As there may be multiple memory loads with the
same base but different offsets, those would end up in a SHLADD instead
of the offset of the memory operation.
This moves the pass after we've had a chance to attempt to propagate
immediate adds into the indirect offset.
total instructions in shared programs : 6580681 -> 6567716 (-0.20%)
total gprs used in shared programs : 944261 -> 943375 (-0.09%)
total shared used in shared programs : 0 -> 0 (0.00%)
total local used in shared programs : 15328 -> 15328 (0.00%)
total bytes used in shared programs : 60339896 -> 60221504 (-0.20%)
local shared gpr inst bytes
helped 0 0 555 2698 2698
hurt 0 0 138 336 336
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a MERGE operation gets its constraint moves added, it
susbstantially extends live ranges to be reusing an immediate from
earlier in the program (not to mention the silliness of loading an
immediate into a register, and then moving into another register).
We detect these scenarios and insert moves that take the immediate or
constbuf load directly into the register. If it's the last use, then we
can just move that operation to the closer location.
With SM35 (255 regs) we get these results:
total instructions in shared programs : 6583670 -> 6580681 (-0.05%)
total gprs used in shared programs : 950818 -> 944261 (-0.69%)
total shared used in shared programs : 0 -> 0 (0.00%)
total local used in shared programs : 15328 -> 15328 (0.00%)
total bytes used in shared programs : 60367456 -> 60339896 (-0.05%)
local shared gpr inst bytes
helped 0 0 4584 3186 3186
hurt 0 0 55 968 968
I suspect they will be better for SM20 and SM30.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
We can still use the optimized division methods which make use of
multiplication with overflow.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tobias Klausmann <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It's common to use signed int modulo in GLSL. As it happens, the GLSL
specs allow the result to be undefined, but that seems fairly
surprising. It's not that much more effort to get it right, at least for
positive modulo operators.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
This is a copy of the a5xx logic. Fails a few tests, but basic
functionality is there.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Copied from a5xx, should be identical.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Unfortunately Adreno A4xx hardware returns incorrect results with the
GATHER4 opcodes. As a result, we have to lower to 4 individual texture
calls (txl since we have to force lod to 0). We achieve this using
offsets, including on cube maps which normally never have offsets.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
now you can hack the driver to enable DCC for displayable textures and
Glamor that doesn't enable that by default won't crash anymore.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
This has no effect because both occupy the same memory in a union.
Reviewed-by: Nicolai Hähnle <[email protected]>
|