| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The disk cache code tries to allocate a 256 Kbyte buffer on the stack.
Since musl only gives 80 Kbyte of stack space per thread, this causes a
trap.
See https://wiki.musl-libc.org/functional-differences-from-glibc.html#Thread-stack-size
(In musl-1.1.21 the default stack size has increased to 128K)
[mattst88]: Original author unknown, but I think this is small enough
that it is not copyrightable.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use hash_table_u64 instead of hash_table directly, since the former
will also handle the special keys (deleted and freed) and allow use
the whole u64 space.
Fixes crash in INTEL_DEBUG=bat when using a key with value 0 -- the
current value for a freed key.
Fixes: b38dab101ca "util/hash_table: Assert that keys are not reserved pointers"
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The hash_table_u64 should support any uint64_t as input. It does
special handling for the "deleted" key, storing the data in the table
itself; do the same for the "freed" key.
Fixes: b38dab101ca "util/hash_table: Assert that keys are not reserved pointers"
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main motivation for this change is API ergonomics: most operations
on dynarrays are really on elements, not on bytes, so it's weird to have
grow and resize as the odd operations out.
The secondary motivation is memory safety. Users of the old byte-oriented
functions would often multiply a number of elements with the element size,
which could overflow, and checking for overflow is tedious.
With this change, we only need to implement the overflow checks once.
The checks are cheap: since eltsize is a compile-time constant and the
functions should be inlined, they only add a single comparison and an
unlikely branch.
v2:
- ensure operations are no-op when allocation fails
- in util_dynarray_clone, call resize_bytes with a compile-time constant element size
v3:
- fix iris, lima, panfrost
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We're not very good at handling out-of-memory conditions in general, but
this change at least gives the caller the option of handling it gracefully
and without memory leaks.
This happens to fix an error in out-of-memory handling in i965, which has
the following code in brw_bufmgr.c:
node = util_dynarray_grow(vma_list, sizeof(struct vma_bucket_node));
if (unlikely(!node))
return 0ull;
Previously, allocation failure for util_dynarray_grow wouldn't actually
return NULL when the dynarray was previously non-empty.
v2:
- make util_dynarray_ensure_cap a no-op on failure, add MUST_CHECK attribute
- simplify the new capacity calculation: aside from avoiding a useless loop
when newcap is very large, this also avoids an infinite loop when newcap
is larger than 1 << 31
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110901
Fixes: 7dc2f4788288ec9c7ab6 "util: emulate futex on FreeBSD using umtx"
Cc: Greg V <[email protected]>
Signed-off-by: Eric Engestrom <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
Fixes: 316964709e21286c2af5 "util: add os_read_file() helper"
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
If we insert a NULL key, it will appear to succeed but will mess up
entry counting. Similar errors can occur if someone accidentally
inserts the deleted key. The later is highly unlikely but technically
possible so we should guard against it too.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
If we insert a NULL key, it will appear to succeed but will mess up
entry counting. Similar errors can occur if someone accidentally
inserts the deleted key. The later is highly unlikely but technically
possible so we should guard against it too.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
While we're here, copy the size table from set.c to get rid of hard tabs
in the hash_table.c version.
Reviewed-by: Eric Anholt <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Compilation times with my shader-db database:
Difference at 95.0% confidence
-1.22312 +/- 0.726033
-0.283979% +/- 0.168254%
(Student's t, pooled s = 1.02177)
Reviewed-by: Eric Anholt <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This should be at least as fast as using fast_idiv_by_const, and has the
advantage that the precomputation is simple enough to be evaluated at
Mesa-compile time for hash tables and sets which have a fixed table of
possible divisors.
Acked-by: Eric Anholt <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
To keep it in sync with the set implementation.
Reviewed-by: Eric Anholt <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A significant portion of the time spent in nir_opt_cse for the Dolphin
ubershaders was in resizing the set. When resizing a hash table, we know
in advance that each new element to be inserted will be different from
every other element, so we don't have to compare them, and there will be
no tombstone elements, so we don't have to worry about caching the
first-seen tombstone. We add a specialized add function which skips
these steps entirely, speeding up resizing.
Compile-time results from my shader-db database:
Difference at 95.0% confidence
-2.29143 +/- 0.845534
-0.529475% +/- 0.194767%
(Student's t, pooled s = 1.08807)
Reviewed-by: Eric Anholt <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
To keep the set and hash table in sync. Note that some of this had
already been done for hash tables, in particular pulling out the
hash % ht->size computation.
Reviewed-by: Eric Anholt <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unfortunately GCC can't do this for us, probably because we call the key
comparison function which GCC can't prove won't modify arbitrary memory.
This is a pretty hot function, so do the optimization manually to be
sure the compiler will get it right.
While we're here, make the computation of the new probe address use a
single conditional subtract instead of a modulo, since we know that it
won't ever get as big as 2 * ht->size before the modulo. Modulos tend to
be pretty expensive operations.
shader-db compile time results for my database:
Difference at 95.0% confidence
-2.24934 +/- 0.69897
-0.516296% +/- 0.159993%
(Student's t, pooled s = 0.983684)
Reviewed-by: Eric Anholt <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Unlike _mesa_set_search_and_add(), it doesn't replace an entry if it's
found, returning it instead. This is useful for nir_instr_set, where
we have to know both the original original instruction and its
equivalent.
Reviewed-by: Eric Anholt <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Debugging use of unsafe iterators when you should have used the _safe
version sucks. Add some DEBUG build support to catch and assert if
someone does that.
I didn't update the UPPERCASE verions of the iterators. They should
probably be deprecated/removed.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Erik Faye-Lund <[email protected]>
|
|
|
|
|
|
|
|
| |
debugoptimized builds don't define NDEBUG, but they also don't define
DEBUG. We want to enable cheap debug code for these builds.
I only chose those occurences that I care about.
Reviewed-by: Mathias Fröhlich <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110711
|
|
|
|
|
|
| |
Required to use uint8_t
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
| |
It may decrease performance and it prevents compute-based primitive culling.
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use fstat() only to pre-allocate a big enough buffer.
This fixes a race where if the file grows between fstat() and read()
we would be missing the end of the file, and if the file slims down
read() would just fail.
Fixes: 316964709e21286c2af5 "util: add os_read_file() helper"
Reported-by: Jason Ekstrand <[email protected]>
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We want to be able to call ra_allocate() and, when it fails, mutate the
graph and try again rather than re-building the graph from scratch.
This commit moves all the scratch bits except the final register
allocation (which is really an out value not scratch) into sub-structs
named "tmp" to make it clear which things are scratch. It also adds
bits to the ra_select() initialization loop to initialize things (since
we can't trust rzalloc anymore) and copy q_test and forced_reg over.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
Unfortunately, we can't quite follow the standard C conventions for
these because ralloc doesn't know the sizes of pointers.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The most expensive part of register allocation is the ra_simplify step
which is a fixed-point algorithm with a worst-case complexity of O(n^2)
which adds the registers to a stack which we then use later to do the
actual allocation. This commit uses bit sets and changes the core loop
of ra_simplify to first walk 32-node chunks and then walk each chunk.
This lets us skip whole 32-node chunks in one go based on bit operations
and compute the minimum q value potentially 32x as fast. Of course, the
algorithm still has the same fundamental O(n^2) run-time but the
constant is now much lower.
In the nasty Aztec Ruins compute shader, this shaves a full four seconds
off the 30s compile time for a release build of mesa. In a debug build
(needed for accurate stack traces), perf says that ra_select takes 20%
of runtime before this patch and only 5-6% of runtime after this patch.
It also makes shader-db runs faster.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15311100 -> 15311100 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
total cycles in shared programs: 355468050 -> 355468050 (0.00%)
cycles in affected programs: 0 -> 0
helped: 0
HURT: 0
Total CPU time (seconds): 2602.37 -> 2524.31 (-3.00%)
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
We only use q_total if the reg is not assigned so there's no point in
updating it if the reg is not assigned. This has no known perf benefit
but it will reduce churn in a future commit.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
This shaves about half a second off the 30 second compile time of one of
the compute shaders in Aztec ruins.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
The V3D 4.2 HW has a limit to MSAA texture sizes of 4096. With non-MSAA,
we can go up to 7680 (actually probably 8138, but that hasn't been
validated by the HW team). Exposing 7680 in X11 will allow dual 4k displays.
|
|
|
|
|
|
|
|
|
| |
Often times you don't know how big a set will be and you want the code
to just grow it as needed. However, sometimes you do know and you can
avoid a lot of rehashing if you just specify a size up-front.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Thomas Helland <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This function is identical to _mesa_set_add except that it takes an
extra out parameter that lets the caller detect if a replacement
happened.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Thomas Helland <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This fixes rendering issues with gun scopes which is rather
important.
Cc: "19.0" "19.1" <[email protected]>
Acked-by: Bas Nieuwenhuizen <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100239
|
|
|
|
|
|
|
|
| |
This makes the game playable on radeonsi.
Cc: "19.0" "19.1" <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110143
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
To allow the this test to be built with MSVC, which doesn't support
VLAs.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The MASK macro is used in the RANGE macro, and it should
return the pre-bitset word mask for the (b) value.
i.e.
BITSET_MASK(0) should be undefined since it's meaningless.
BITSET_MASK(31) should give 0x7fffffff
BITSET_MASK(32) should give 0xffffffff
BITSET_MASK(33) should give 0x00000001
BITSET_MASK(64) should give 0xffffffff
However then BITSET_RANGE ends up broken for cases where
it's (b) value is the 0,32,64 value as in that case the lower
mask would be 0 not 0xffffffff.
This fixes the unit tests that I've added, and my code that
uses bitsets.
Reviewed-by: Jason Ekstrand <[email protected]>
Fixes: bb38cadb1c5f2 "More GLSL code"
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
The last test here currently fails as there is a bug in bitset.h
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
I want to be able to do BITSET_TEST() != BITSET_TEST() and this isn't
currently possible because BITSET_TEST() returns a random bit. Compare
to zero to get an actual Boolean.
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This #include is needed for `NULL`, which is used on all OSes, not just Linux.
Reported-by: Juan A. Suarez Romero <[email protected]>
Fixes: 316964709e21286c2af5 "util: add os_read_file() helper"
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Juan A. Suarez <[email protected]>
|
|
|
|
|
|
|
| |
Until we use async shader compilation for constant inlining,
don't enable it unless user asks for it.
Signed-off-by: Axel Davy <[email protected]>
|
|
|
|
|
|
| |
The game requires it to display many textures properly.
Signed-off-by: Axel Davy <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dynamic textures seem to have predictable stride. This stride
should be the same as for a ram buffer.
It seems some game don't check the actual stride value, assuming
it to be the expected one.
Thus this workaround (protected by drirc option) is to use an intermediate
ram buffer.
Fixes Rayman Legends texture issues when enabled.
Signed-off-by: Axel Davy <[email protected]>
|