| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
About 50% faster for CBC decrypt
|
| |
|
| |
|
|
|
|
| |
Causes a warning in amalgamation which is bad news
|
|
|
|
| |
GH #1477
|
|
|
|
| |
Runs as much as 50% faster for bulk operations. Improves GCM by 10%
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
Lack of these broke single file amalgamation (GH #1386)
|
|
|
|
|
| |
Clang doesn't like the way SIMD shifts were implemented, I guess
it fails to inline the constant. Make it a template parameter instead.
|
|
|
|
| |
Since eventually CAST-256 is going away.
|
| |
|
|
|
|
|
| |
Interleaving two blocks is 40-50% faster for any mode that supports
parallel operation.
|
| |
|
|
|
|
|
| |
Similarly to Blowfish, 2x unrolling produces a 50-60% perf boost
due to increased ILP.
|
|
|
|
|
|
|
| |
Doing two blocks at a time exposes more ILP and substantially
improves performance.
Idea from http://jultika.oulu.fi/files/nbnfioulu-201305311409.pdf
|
| |
|
|
|
|
| |
Clearly I have a tic for this.
|
|
|
|
| |
Needed for the create calls
|
|
|
|
|
| |
Previously calling update or encrypt without calling set_key first
would result in invalid outputs or else crashing.
|
| |
|
|
|
|
| |
This ended up allocating 256 KiB!
|
| |
|
|
|
|
|
| |
This improves performance by ~ .5 cycle/byte. Also it ensures that
our cache reading countermeasure works as expected.
|
|
|
|
|
|
|
|
|
| |
Should have significantly better cache characteristics, though it
would be nice to verify this.
It reduces performance somewhat but less than I expected, at least
on Skylake. I need to check this across more platforms to make sure
t won't hurt too badly.
|
|
|
|
|
|
|
|
|
| |
Using a larger table helps quite a bit. Using 4 tables (ala AES T-tables)
didn't seem to help much at all, it's only slightly faster than a single
table with rotations.
Continue to use the 8 bit table in the first and last rounds as a
countermeasure against cache attacks.
|
|
|
|
|
| |
Missed by everything but the OCB wide tests because most ciphers
have fixed width and get the override.
|
|
|
|
| |
GCC 7 can actually vectorize this for AVX2
|
|
|
|
| |
From ~5 cbp to ~2.5 cbp on Skylake
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The problem with asm rol/ror is the compiler can't schedule effectively.
But we only need asm in the case when the rotation is variable, so distinguish
the two cases. If a compile time constant, then static_assert that the rotation
is in the correct range and do the straightforward expression knowing the compiler
will probably do the right thing. Otherwise do a tricky expression that both
GCC and Clang happen to have recognize. Avoid the reduction case; instead
require that the rotation be in range (this reverts 2b37c13dcf).
Remove the asm rotations (making this branch illnamed), because now both Clang
and GCC will create a roll without any extra help.
Remove the reduction/mask by the word size for the variable case. The compiler
can't optimize that it out well, but it's easy to ensure it is valid in the callers,
especially now that the variable input cases are easy to grep for.
|
|
|
|
| |
Nothing major but probably good to clean these up.
|
|
|
|
|
| |
Things like -Wconversion and -Wuseless-cast that are noisy and
not on by default.
|
|
|
|
| |
[ci skip]
|
|
|
|
| |
Sonar
|
|
|
|
| |
Found with Sonar
|
|
|
|
|
|
| |
Mostly residue from the old system of splitting impls among subclasses
Found with Sonar
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
Done by a perl script which converted all classes to final, followed
by selective reversion where it caused compilation failures.
|
|
|
|
| |
Some help from include-what-you-use
|
|
|
|
| |
[ci skip]
|
|
|
|
|
|
| |
ISO C++ reserves names with double underscores in them
Closes #512
|
| |
|