| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Acked-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The only reason why we needed that version was because the Vulkan driver
needed to be able to create the surface states so it could handle
indirect clear colors. Now that blorp handles them natively, there's no
need for the extra entrypoint.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
| |
While we're at it, we break it into two nicely named functions.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
This way uninitialized fields get automatically zeroed and it's safe to
add more fields to blorp_surf.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This doesn't go all the way of avoiding the txf_ms if it's fast-cleared,
however it does at least make us only do it once. This should improve
performance of MSAA resolves in the presence of lots of clear color.
Without the patch, enabling fast-clears in the multisampling Sascha demo
drops the framerate by about 10%. With this patch, enabling fast-clears
increases the demo's framerate by 25%.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
| |
That name is already taken by one of the helpers in blorp_nir_builder.h
and, while we haven't moved the guts of blorp_blit.c there yet, we'd
like to start using some things from that header.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
driver_cache_blob was introduced with the i965 disk cache, it allows
us to simplify the cache a little and possibly offers some minor
speed improvements since we load the GLSL metadata and TGSI from
disk in one pass.
Using driver_cache_blob should also make it straight forward to
implement binary support for ARB_get_program_binary in gallium.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Mun Gwan-gyeong <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The header can be included from C++, hence contents should have
appropriate notation.
Cc: [email protected]
Cc: Dylan Baker <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
| |
r600_texture: 1488 -> 1248 bytes
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The next commit will reduce the size even more.
v2: typecast to uint64_t manually
v3: add more typecasts, add asserts
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
r600_texture: 1736 -> 1488 bytes
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
| |
r600_resource is malloc'd.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103808
Fixes: 4b0dc098b256 ("gallium/u_threaded: don't map big VRAM buffers for the first upload directly")
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
ported from Vulkan
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the TCS doesn't read back the outputs, no need to store them
to LDS in the first place. (except for tess factors).
This seems to give about 50fps (3290->3330) with tessellation demo.
I haven't tested if it impacts DoW3 at all.
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
This is to be used for TCS optimisations on radv.
v2: don't set written on reads (nha)
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
| |
This just makes it easier to debug some things.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This comes in handy when checking "NV50_PROG_DEBUG=1" outputs with diff!
V2:
- Use environmental variable (Karol Herbst)
V3:
- Use the already populated nv50_ir_prog_info to forward information to the
print pass (Pierre Moreau)
V4:
- get rid of default value in PrintPass constructor
Signed-off-by: Tobias Klausmann <[email protected]>
Reviewed-by: Pierre Moreau <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
== 0
Under certain conditions, waiting on a GL sync objects should act like
a flush, regardless of the timeout.
Portal 2, CS:GO, and presumably other Source engine games rely on this
behavior and hang during loading without this fix.
Fixes: bc65dcab3bc4 ("radeonsi: avoid syncing the driver thread in si_fence_finish")
Signed-off-by: Marek Olšák <[email protected]>
Tested-by: Kai Wasserbäch <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103902
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103904
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Memory loads can take offsets, but the SHLADD will often attempt to
consume the offsets too. As there may be multiple memory loads with the
same base but different offsets, those would end up in a SHLADD instead
of the offset of the memory operation.
This moves the pass after we've had a chance to attempt to propagate
immediate adds into the indirect offset.
total instructions in shared programs : 6580681 -> 6567716 (-0.20%)
total gprs used in shared programs : 944261 -> 943375 (-0.09%)
total shared used in shared programs : 0 -> 0 (0.00%)
total local used in shared programs : 15328 -> 15328 (0.00%)
total bytes used in shared programs : 60339896 -> 60221504 (-0.20%)
local shared gpr inst bytes
helped 0 0 555 2698 2698
hurt 0 0 138 336 336
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a MERGE operation gets its constraint moves added, it
susbstantially extends live ranges to be reusing an immediate from
earlier in the program (not to mention the silliness of loading an
immediate into a register, and then moving into another register).
We detect these scenarios and insert moves that take the immediate or
constbuf load directly into the register. If it's the last use, then we
can just move that operation to the closer location.
With SM35 (255 regs) we get these results:
total instructions in shared programs : 6583670 -> 6580681 (-0.05%)
total gprs used in shared programs : 950818 -> 944261 (-0.69%)
total shared used in shared programs : 0 -> 0 (0.00%)
total local used in shared programs : 15328 -> 15328 (0.00%)
total bytes used in shared programs : 60367456 -> 60339896 (-0.05%)
local shared gpr inst bytes
helped 0 0 4584 3186 3186
hurt 0 0 55 968 968
I suspect they will be better for SM20 and SM30.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
We can still use the optimized division methods which make use of
multiplication with overflow.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tobias Klausmann <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It's common to use signed int modulo in GLSL. As it happens, the GLSL
specs allow the result to be undefined, but that seems fairly
surprising. It's not that much more effort to get it right, at least for
positive modulo operators.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
MSVC doesn't support #warning?! Getting really tired of this.
|
|
|
|
|
| |
Fixes: 6a353479a757 ("util: Assume little endian in the absence of
platform-specific handling")
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
This is a copy of the a5xx logic. Fails a few tests, but basic
functionality is there.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Copied from a5xx, should be identical.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Unfortunately Adreno A4xx hardware returns incorrect results with the
GATHER4 opcodes. As a result, we have to lower to 4 individual texture
calls (txl since we have to force lod to 0). We achieve this using
offsets, including on cube maps which normally never have offsets.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
GL doesn't have this, but some hardware supports it. This is convenient
for lowering tg4 to plain texture calls, which is necessary on Adreno
A4xx hardware.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The cache-test test program attempts to create a collision (using key_a
and key_a_collide) by making the first two bytes identical. The idea is
fine -- the shader cache wants to use the first four characters of a
SHA1 hex digest as the index.
The following program
unsigned char array[4] = {1, 2, 3, 4};
int *ptr = (int *)array;
for (int i = 0; i < 4; i++) {
printf("%02x", array[i]);
}
printf("\n");
printf("%08x\n", *ptr);
prints
01020304
04030201
on little endian, and
01020304
01020304
on big endian.
On big endian platforms reading the character array back as an int (as
is done in disk_cache.c) does not yield the same results as reading the
byte array.
To get the first four characters of the SHA1 hex digest when we mask
with CACHE_INDEX_KEY_MASK, we need to byte swap the int on big endian
platforms.
Bugzilla: https://bugs.freedesktop.org/103668
Bugzilla: https://bugs.gentoo.org/637060
Bugzilla: https://bugs.gentoo.org/636326
Fixes: 87ab26b2ab35 ("glsl: Add initial functions to implement an
on-disk cache")
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
| |
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The code defines a macro blk0(i) based on the preprocessor condition
BYTE_ORDER == LITTLE_ENDIAN. If true, blk0(i) is defined as a byte swap
operation. Unfortunately, if the preprocessor macros used in the test
are no defined, then the comparison becomes 0 == 0 and it evaluates as
true.
Fixes: d1efa09d342b ("util: import sha1 implementation from OpenBSD")
Reviewed-by: Emil Velikov <[email protected]>
|
| |
|
|
|
|
|
|
| |
There are only 32 vertex attribs now.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't support ARB_vertex_blend.
Note that the attribute aliasing check for ARB_vertex_program had to be
rewritten.
vbo_context: 20344 -> 20008 bytes
gl_context: 74672 -> 74616 bytes
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
| |
I plan to remove one of them.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
This is needed for profiling multi-context applications like Chrome.
One context can record queries and another context can draw the HUD.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
This is the boring subset of the following commit.
All new parameters are optional.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|