| Commit message (Collapse) | Author | Age | Files | Lines |
|\ |
|
| | |
|
| | |
|
|/
|
|
|
|
|
| |
Causes the connection to break for some servers. Fixes GH #1276
Also avoid setting the same extension twice in the initial connection case.
The extensions code dedups it so this wasn't a problem, but confusing.
|
|
|
|
|
|
|
|
|
|
| |
While older versions of GCC did very badly with __builtin_bswap on
ARM, I checked GCC 4.8 and it behaves correctly, emitting either rev
or else the same optimal sequence as was used in the inline asm
(depending on if ARMv7 is enabled or not.)
Enable MSVC byteswap intrinsics, which (hopefully) work on all platforms.
Drop the x86-32 specific asm for byteswap.
|
| |
|
|\ |
|
| |
| |
| |
| |
| |
| |
| | |
The server may not support the supported groups
extension and choose an arbitrary group. RFC 7919
permits clients to continue if the group is
acceptable under local policy, which we do now.
|
| | |
|
| | |
|
| | |
|
|\ \ |
|
| | | |
|
|\ \ \
| |/ /
|/| | |
|
| | | |
|
| | | |
|
| | |
| | |
| | |
| | | |
The buffer is not aligned :/
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Prohibit very small counter widths (under 4 bytes), since they lead
to trivial keystream reuse.
Add tests.
Fix clone which always returned an object with a block-wide counter.
|
| | |
| | |
| | |
| | |
| | |
| | | |
CBC mode already has this same size check.
[ci skip]
|
| | |
| | |
| | |
| | | |
About 30% faster than scalar on Skylake
|
| | |
| | |
| | |
| | |
| | | |
This reduces code and also lets TLS make use of parallel decryption
which it was not doing before.
|
| | | |
|
| | |
| | |
| | |
| | | |
Cannot figure out how to get MSVC to shut up
|
| | | |
|
| | | |
|
|/ /
| |
| |
| |
| | |
MSVC produces a deranged warning that the compiler generated destructor is deprecated,
try to shut it up.
|
|\ \ |
|
| | | |
|
| | | |
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In CTR, special case for counter widths of special interest.
In GHASH, uses a 4x reduction technique suggested by Intel.
Split out GHASH to its own source file and header.
With these changes GCM is over twice as fast on Skylake and about
50% faster on Westmere.
|
|/ |
|
|
|
|
| |
This ended up allocating 256 KiB!
|
| |
|
|
|
|
|
| |
Avoid copying inputs needlessly, on Skylake doubles performance
(from 1 GB/s -> 2 GB/s)
|
|\ |
|
| |
| |
| |
| |
| | |
This improves performance by ~ .5 cycle/byte. Also it ensures that
our cache reading countermeasure works as expected.
|
|\| |
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Should have significantly better cache characteristics, though it
would be nice to verify this.
It reduces performance somewhat but less than I expected, at least
on Skylake. I need to check this across more platforms to make sure
t won't hurt too badly.
|
|/
|
|
|
|
|
|
|
| |
This leads to a rather shocking decrease in binary sizes, especially
the static library (~1.5 MB reduction). Saves 60KB in the shared lib.
Since throwing or catching an exception is relatively expensive these
not being inlined is not a problem in that sense. It had simply not
occured to me that it would take up so much extra space in the binary.
|
|
|
|
|
|
|
|
|
| |
Using a larger table helps quite a bit. Using 4 tables (ala AES T-tables)
didn't seem to help much at all, it's only slightly faster than a single
table with rotations.
Continue to use the 8 bit table in the first and last rounds as a
countermeasure against cache attacks.
|
|
|
|
|
|
| |
Fixes #1235
[ci skip]
|
|
|
|
| |
Went from 27 to 20 cycles per byte on Skylake (with clmul disabled)
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
By allowing multiple blocks for clmul, slight speedup there though still
far behind optimum.
Precompute a table of multiples of H, 3-4x faster on systems without clmul
(and still no secret indexes).
Refactor GMAC to not derive from GHASH
|
|\ \ |
|
| |/
| |
| |
| |
| | |
Gentoo-Bug: https://bugs.gentoo.org/633468
Signed-off-by: Alon Bar-Lev <[email protected]>
|
|/ |
|
|
|
|
|
|
| |
With fast AES-NI, gets down to about 2 cycles per byte which is
pretty good compared to the ~5.5 cpb of 2.3, still a long way off
the best stiched impls which run at ~0.6 cpb.
|
|
|
|
| |
Avoids the cast alignment problems of yesteryear
|
|
|
|
| |
[ci skip]
|