| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
expansion. While I would prefer to have the compiler to this, using GCC 4.1.2
it is 4% faster on a Core2 Q6600 with the loops partially unrolled.
|
|
|
|
|
|
|
| |
but might as well keep it up to date. And it's easier to do it once with
a 'perl -pi' command than to update each file over time.
Apologies to anyone looking at diffs.
|
|
|
|
| |
when I was testing on x86 and x86-64 machines.
|
|
|
|
|
|
|
|
|
| |
Where loadstor.h was needed but only implicitly included via bit_ops.h,
include it directly
Add endian reversal functions to bit_ops.h
Remove some unneeded includes in big_ops2.cpp and a few other files.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
into
account endian differences.
The current code does not take advantage of the knowledge of which endianness
we are running on; an optimization suggested by Yves Jerschow is to use (unsafe)
casts to speed up the load/store operations. This turns out to provide large
performance increases (30% or more) in some cases.
Even without the unsafe casts, this version seems to average a few percent
faster, probably because the longer loading loops have been partially or
fully unrolled.
This also makes the code implementing low-level algorithms like ciphers and
hashes a bit more succint.
|
| |
|
|
|
|
| |
use a little extra workspace, this makes that simpler to do.
|
| |
|
|
|