| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
About 2x faster on Skylake
|
| |
|
|\ |
|
| | |
|
|/
|
|
| |
This is in the hot path for GCM
|
|
|
|
| |
The last 4 bytes are always overwritten in this loop.
|
|
|
|
|
| |
Not complete, just trying to hit the most commonly used APIs plus the
ones that are easy to do.
|
| |
|
|
|
|
| |
Various configurations would fail build or test, fix that.
|
|
|
|
|
| |
We need this for Kyber, which uses 34 byte inputs to XOF when
computing the public matrix.
|
|
|
|
|
|
|
| |
It was only needed for one case which is easily hardcoded. Include
rotate.h in all the source files that actually use rotr/rotl but
implicitly picked it up via loadstor.h -> bswap.h -> rotate.h include
chain.
|
|
|
|
| |
static_casts for the compiler god
|
|
|
|
|
|
|
|
|
|
|
| |
Avoid throwing base Botan::Exception type, as it is difficult to
determine what the error is in that case.
Add Exception::error_code and Exception::error_type which allows
(for error code) more information about the error and (for error type)
allows knowing the error type without requiring a sequence of catches.
See GH #1742
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
Otherwise some CPUs suffer serious stalls. Using vzeroall on exit
also has the nice effect that we don't have to worry about register
contents leaking.
HT to @noloader for doing the background research on this.
|
|
|
|
| |
This is not exhaustive. See GH #1733
|
|
|
|
|
|
| |
Using the same transposition trick used for SSE2 in #1728
On my Skylake desktop about 5-10% faster depending on buffer sizes.
|
|
|
|
|
|
|
|
|
|
|
| |
This allows supporting SSE2, NEON and AltiVec in a single codebase,
so drop the NEON and SSE2 code.
This new impl avoids having to do shuffles with every round and so
is about 10% faster on Skylake.
Also, fix bugs in both baseline and AVX2 implementations when the
low counter overflowed. The SSE2 and NEON code were also buggy here.
|
|
|
|
|
| |
Originally written by Jeffrey Walton for Crypto++, which was in turn
based on my SSE2 ChaCha.
|
|
|
|
|
| |
It is confusing as while the stream cipher state is the input to
the permutation, the stream cipher has an unrelated input (the text).
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
We don't need to read each block since we know what is there
Improves CTR perf with AES-NI by 5-6%, also helps GCM
GH #969
|
|
|
|
|
| |
When used with AES-128 on Skylake (AES-NI), improves GCM performance
by 10% on small messages and 5% on 1K messages.
|
|
|
|
|
| |
Avoids the XOR operation. Only implemented for ChaCha20 currently,
everything else defaults to memset-to-zero + xor-cipher
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
If you called set_key, then set_iv, then set_iv again without having
previously reset the key, you would end up with a garbled state buffer
that depended on the value of the first IV.
This only affected 192-bit Salsa nonces, not other sizes.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Add a test to detect that.
Add test that stream ciphers throw if they are asked to use
a nonce of a size they don't support.
Remove "In = 00...00" blocks since that's implicit in the stream
cipher tests.
|
| |
|
|
|
|
| |
Add a test that StreamCipher::seek throws if not keyed.
|
|
|
|
| |
Needed for the create calls
|
|
|
|
|
| |
Previously calling update or encrypt without calling set_key first
would result in invalid outputs or else crashing.
|
| |
|
|
|
|
|
|
|
|
|
| |
Prohibit very small counter widths (under 4 bytes), since they lead
to trivial keystream reuse.
Add tests.
Fix clone which always returned an object with a block-wide counter.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
In CTR, special case for counter widths of special interest.
In GHASH, uses a 4x reduction technique suggested by Intel.
Split out GHASH to its own source file and header.
With these changes GCM is over twice as fast on Skylake and about
50% faster on Westmere.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The problem with asm rol/ror is the compiler can't schedule effectively.
But we only need asm in the case when the rotation is variable, so distinguish
the two cases. If a compile time constant, then static_assert that the rotation
is in the correct range and do the straightforward expression knowing the compiler
will probably do the right thing. Otherwise do a tricky expression that both
GCC and Clang happen to have recognize. Avoid the reduction case; instead
require that the rotation be in range (this reverts 2b37c13dcf).
Remove the asm rotations (making this branch illnamed), because now both Clang
and GCC will create a roll without any extra help.
Remove the reduction/mask by the word size for the variable case. The compiler
can't optimize that it out well, but it's easy to ensure it is valid in the callers,
especially now that the variable input cases are easy to grep for.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Correct errors in the AEAD tests that assumed process/update always
return something - that isn't true for SIV
Minor optimizations in CMAC and CTR to cache the block size instead
of making a zillion virtual calls for it.
Generalize SIV slightly to where it could support a non-128 bit
cipher, but don't pull the trigger on it since I can't find any
implementations to crosscheck with.
|
| |
|