| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Mostly by avoiding strange corner cases in compiler code generation
rather than anything clever.
Improves Skylake x86 by 1.08x encrypt/no change for decrypt
Improves ARMv7 (Pi2) by 1.2x encrypt/1.42x decrypt
Improves Aarch64 (Cortex-A53) by 1.45x encrypt/2.15x decrypt
Improves POWER8 by 18x encrypt/19.5x decrypt
Crazy POWER8 improvement due to the fact that shuffle function was not
being inlined properly by GCC 9 due to differing ISA enablement
|
|
|
|
| |
See #2226
|
| |
|
| |
|
|
|
|
| |
Same algorithms were used just using SSSE3 vs NEON
|
|
|
|
| |
Nothing enabled ssse3 in that case.
|
|\ |
|
| | |
|
| |
| |
| |
| |
| | |
All the constants need to be tweaked and possibly other changes
are required.
|
| |
| |
| |
| | |
Slower than T-tables on the machines I've tried, but constant time.
|
|/
|
|
| |
I do not understand the mechanism but this is slightly faster.
|
|\ |
|
| |
| |
| |
| | |
Improves performance by 20-30% on POWER9
|
| |
| |
| |
| | |
Previously --disable-sse2/--disable-ssse3 would not work as expected
|
| | |
|
| | |
|
| | |
|
|/
|
|
| |
Rename aes_ssse3 -> aes_vperm
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
Now this is checked at the higher level
|
|
|
|
|
|
|
| |
It was only needed for one case which is easily hardcoded. Include
rotate.h in all the source files that actually use rotr/rotl but
implicitly picked it up via loadstor.h -> bswap.h -> rotate.h include
chain.
|
|
|
|
| |
This is primarily just to verify that C++11 constexpr works.
|
|
|
|
| |
Closes GH #1557
|
|
|
|
| |
It is the default...
|
|
|
|
|
|
|
|
| |
Also prefetch SD during decryption since both TD and SD are used there.
Need for prefetch in the key schedule identified in the paper
"Eliminating Timing Side-Channel Leaks using Program Repair"
by Guo, Schaumont, Wang
|
| |
|
|
|
|
| |
Runs as much as 50% faster for bulk operations. Improves GCM by 10%
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
Previously calling update or encrypt without calling set_key first
would result in invalid outputs or else crashing.
|
| |
|
|
|
|
| |
This ended up allocating 256 KiB!
|
| |
|
|
|
|
|
| |
This improves performance by ~ .5 cycle/byte. Also it ensures that
our cache reading countermeasure works as expected.
|
|
|
|
|
|
|
|
|
| |
Should have significantly better cache characteristics, though it
would be nice to verify this.
It reduces performance somewhat but less than I expected, at least
on Skylake. I need to check this across more platforms to make sure
t won't hurt too badly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The problem with asm rol/ror is the compiler can't schedule effectively.
But we only need asm in the case when the rotation is variable, so distinguish
the two cases. If a compile time constant, then static_assert that the rotation
is in the correct range and do the straightforward expression knowing the compiler
will probably do the right thing. Otherwise do a tricky expression that both
GCC and Clang happen to have recognize. Avoid the reduction case; instead
require that the rotation be in range (this reverts 2b37c13dcf).
Remove the asm rotations (making this branch illnamed), because now both Clang
and GCC will create a roll without any extra help.
Remove the reduction/mask by the word size for the variable case. The compiler
can't optimize that it out well, but it's easy to ensure it is valid in the callers,
especially now that the variable input cases are easy to grep for.
|
| |
|
| |
|
|
|
|
| |
Some help from include-what-you-use
|
|
|
|
|
|
| |
ISO C++ reserves names with double underscores in them
Closes #512
|
|
|
|
|
| |
Defined in build.h, all equal to BOTAN_DLL so ties into existing
system for exporting symbols.
|
|
|
|
| |
Based on the patch in GH #1146
|
|
|
|
| |
GH #1077
|
| |
|
|
|
|
|
|
| |
Allow an empty nonce to mean "continue using the current cipher state".
GH #864
|