| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
It was only needed for one case which is easily hardcoded. Include
rotate.h in all the source files that actually use rotr/rotl but
implicitly picked it up via loadstor.h -> bswap.h -> rotate.h include
chain.
|
|
|
|
|
| |
It is confusing since its not clear from the name how many
elements it has, and this gives consistency with SIMD_8x32 type.
|
|
|
|
| |
Lack of these broke single file amalgamation (GH #1386)
|
| |
|
|
|
|
|
| |
Previously calling update or encrypt without calling set_key first
would result in invalid outputs or else crashing.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The problem with asm rol/ror is the compiler can't schedule effectively.
But we only need asm in the case when the rotation is variable, so distinguish
the two cases. If a compile time constant, then static_assert that the rotation
is in the correct range and do the straightforward expression knowing the compiler
will probably do the right thing. Otherwise do a tricky expression that both
GCC and Clang happen to have recognize. Avoid the reduction case; instead
require that the rotation be in range (this reverts 2b37c13dcf).
Remove the asm rotations (making this branch illnamed), because now both Clang
and GCC will create a roll without any extra help.
Remove the reduction/mask by the word size for the variable case. The compiler
can't optimize that it out well, but it's easy to ensure it is valid in the callers,
especially now that the variable input cases are easy to grep for.
|
|
|
|
| |
[ci skip]
|
|
|
|
|
|
| |
ISO C++ reserves names with double underscores in them
Closes #512
|
| |
|
|
|
|
|
|
| |
Using _mm_set_epi32 caused 2 distinct (adjacent) loads followed
by an unpack to combine the registers. Have not tested on hardware
to see if this actually improves performance.
|
|
|
|
|
|
| |
Combine several shuffle operations into one. Thanks to jww for the hint.
Probably not noticably faster on any system.
|
| |
|
| |
|
| |
|
|
|
|
|
| |
It complains it cannot pass the __m128i without loss of alignment.
(Why, I have no idea.)
|
|
|
|
| |
Bit over 2x faster on my desktop
|
|
256 bit ARX block cipher with hardware support, what's not to love.
|