util: Change the pointer hashing function - mesa.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Thomas Helland <[email protected]>	2017-02-10 19:14:32 +0100
committer	Timothy Arceri <[email protected]>	2017-05-22 09:17:37 +1000
commit	90dfcc6b32c5a635d30cd04fc887a7ff78d3476d (patch)
tree	ed2211cf520e23fe2e29d0e79e45c07ab748ebf3 /src/intel/compiler/brw_ir_allocator.h
parent	1586768e7475a2732650f0ec2738b4e8429e4b40 (diff)

util: Change the pointer hashing function

Use our knowledge that pointers are at least 4 byte aligned to remove the useless digits. Then shift by 6, 10, and 14 bits and add this to the original pointer, effectively folding in the entropy of the higher bits of the pointer into a 4-bit section. Stopping at 14 means we can add the entropy from 18 bits, or at least a 600Kbyte section of memory. Assuming that ralloc allocates from a linearly allocated heap less than this we can make a very efficient pointer hashing function for our usecase. Even if we are not on an architecture that is 4 byte aligned, there is still a high big chance that the thing we are allocating is at least 8 bytes in size, so even then we will have entropy into the third bit. The 4 bit increment on the shifts is chosen rather arbitrarily; if we had chosen a 3 bit increment we would need to add another xor to cover a decently sized memorypool. Increasing it to 5 bits would spread our entropy more, possibly hurting us with more collisions on hash tables of size less than 32. With a hash table of size 16 there are a max of 11 entries, and we can assume that with such a small table collisions are not that painfull. This allows us to hash the whole 32 or 64 bit pointer at once, instead of running FNV1a, looping through each byte and doing increments, decrements, muls, and xors on every byte. This cuts _mesa_hash_data from 1.5 % on profiles, to making _mesa_hash_pointer show up with a 0.09% share. Collisions on insertion actually seems to be ever so slightly lower with this hash function, as found by printing a loop counter and sorting the data. perf stat shows a 1.5% reduction in instruction count, and a 5% reduction in stalled cycles. Shader-db runtime goes from 225 to 220 seconds. No instruction-count changes in shader-db, but there are some minor changes in cycle-count that is likely caused by nir walking a set in some of its passes, and this causing a different ordering. That might eventually lead to a difference in register allocation. However, the effect is a net positive; total cycles in shared programs: 24739550 -> 24738482 (-0.00%) cycles in affected programs: 374468 -> 373400 (-0.29%) helped: 178 HURT: 49 Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Eric Anholt <[email protected]>

Diffstat (limited to 'src/intel/compiler/brw_ir_allocator.h')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: