aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Add supported groups TLS extension (RFC 7919)René Korthaus2017-10-1716-35/+372
|
* Simplify speed cmdlet, make summary optional, add JSON outputJack Lloyd2017-10-161-501/+455
|
* Correct usage of std::aligned_storageJack Lloyd2017-10-151-6/+6
| | | | This ended up allocating 256 KiB!
* Additional final annotationsJack Lloyd2017-10-1527-44/+44
|
* GMAC optimizationJack Lloyd2017-10-153-22/+34
| | | | | Avoid copying inputs needlessly, on Skylake doubles performance (from 1 GB/s -> 2 GB/s)
* Merge GH #1257 Use std::aligned_storage for AES T-tableJack Lloyd2017-10-151-32/+56
|\
| * Use overaligned storage for AES T-TableJack Lloyd2017-10-141-32/+56
| | | | | | | | | | This improves performance by ~ .5 cycle/byte. Also it ensures that our cache reading countermeasure works as expected.
* | Merge GH #1255 Use a single T-table in AESJack Lloyd2017-10-151-127/+78
|\|
| * Reduce AES to using a single T-tableJack Lloyd2017-10-131-127/+78
| | | | | | | | | | | | | | | | | | Should have significantly better cache characteristics, though it would be nice to verify this. It reduces performance somewhat but less than I expected, at least on Skylake. I need to check this across more platforms to make sure t won't hurt too badly.
* | De-inline bodies of exception classesJack Lloyd2017-10-153-67/+133
|/ | | | | | | | | This leads to a rather shocking decrease in binary sizes, especially the static library (~1.5 MB reduction). Saves 60KB in the shared lib. Since throwing or catching an exception is relatively expensive these not being inlined is not a problem in that sense. It had simply not occured to me that it would take up so much extra space in the binary.
* Optimizations for SM4Jack Lloyd2017-10-132-36/+95
| | | | | | | | | Using a larger table helps quite a bit. Using 4 tables (ala AES T-tables) didn't seem to help much at all, it's only slightly faster than a single table with rotations. Continue to use the 8 bit table in the first and last rounds as a countermeasure against cache attacks.
* Update list of block ciphers in speed cliJack Lloyd2017-10-131-0/+11
| | | | [ci skip]
* Accept SHA-1, SHA1, or SHA-160 equallyJack Lloyd2017-10-133-3/+3
| | | | | | Fixes #1235 [ci skip]
* Further GCM optimizationsJack Lloyd2017-10-132-18/+28
| | | | Went from 27 to 20 cycles per byte on Skylake (with clmul disabled)
* Merge GH #1253 GCM optimizationsJack Lloyd2017-10-1310-174/+245
|\
| * Optimize GCMJack Lloyd2017-10-1310-174/+245
| | | | | | | | | | | | | | | | | | | | By allowing multiple blocks for clmul, slight speedup there though still far behind optimum. Precompute a table of multiples of H, 3-4x faster on systems without clmul (and still no secret indexes). Refactor GMAC to not derive from GHASH
* | Merge GH #1254 Add missing includeJack Lloyd2017-10-131-0/+1
|\ \
| * | Add limits.h header for INT_MAXAlon Bar-Lev2017-10-131-0/+1
| |/ | | | | | | | | Gentoo-Bug: https://bugs.gentoo.org/633468 Signed-off-by: Alon Bar-Lev <[email protected]>
* / Use memcpy trick in 3-arg xor_buf alsoJack Lloyd2017-10-131-23/+17
|/
* Update newsJack Lloyd2017-10-131-0/+4
| | | | [ci skip]
* OCB optimizationsJack Lloyd2017-10-132-58/+90
| | | | | | With fast AES-NI, gets down to about 2 cycles per byte which is pretty good compared to the ~5.5 cpb of 2.3, still a long way off the best stiched impls which run at ~0.6 cpb.
* Somewhat faster xor_bufJack Lloyd2017-10-121-18/+15
| | | | Avoids the cast alignment problems of yesteryear
* Remove needless mutableJack Lloyd2017-10-121-2/+2
| | | | [ci skip]
* Swapped encrypt and decrypt in BlockCipher _xex functionsJack Lloyd2017-10-121-2/+2
| | | | | Missed by everything but the OCB wide tests because most ciphers have fixed width and get the override.
* Add some additional CPU aliases for x86-64Jack Lloyd2017-10-121-5/+8
|
* Interleave SM3 message expansionJack Lloyd2017-10-121-141/+142
| | | | Reduces stack usage and a bit faster
* Use SIMD for in ThreefishJack Lloyd2017-10-121-2/+2
| | | | GCC 7 can actually vectorize this for AVX2
* OCB optimizationsJack Lloyd2017-10-127-124/+163
| | | | From ~5 cbp to ~2.5 cbp on Skylake
* Merge GH #1247 Improve bit rotation functionsJack Lloyd2017-10-1236-661/+739
|\
| * Ugh, the GCC/Clang trick triggers C4146 under MSVCJack Lloyd2017-10-121-8/+25
| | | | | | | | | | | | And rotate.h is a visible header. Blerg. Inline asm it is.
| * Add compile-time rotation functionsJack Lloyd2017-10-1236-677/+716
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem with asm rol/ror is the compiler can't schedule effectively. But we only need asm in the case when the rotation is variable, so distinguish the two cases. If a compile time constant, then static_assert that the rotation is in the correct range and do the straightforward expression knowing the compiler will probably do the right thing. Otherwise do a tricky expression that both GCC and Clang happen to have recognize. Avoid the reduction case; instead require that the rotation be in range (this reverts 2b37c13dcf). Remove the asm rotations (making this branch illnamed), because now both Clang and GCC will create a roll without any extra help. Remove the reduction/mask by the word size for the variable case. The compiler can't optimize that it out well, but it's easy to ensure it is valid in the callers, especially now that the variable input cases are easy to grep for.
| * Use rol/ror x86 instructions on GCC/ClangJack Lloyd2017-10-111-2/+24
| | | | | | | | | | | | | | Neither is very good at recognizing rotate sequences. For cases where the rotation value is a constant they do fine, but for variable rotations they do horribly. Using inline asm here improved performance of both CAST-128 and CAST-256 by ~20% on my system with both GCC and Clang.
* | Merge GH #1251 Fix CMakeJack Lloyd2017-10-121-3/+5
|\ \
| * | Prevent a lint complaint.Frank Schoenmann2017-10-121-1/+2
| | |
| * | Repair generation of CMakeLists.txt after some files have been moved.Frank Schoenmann2017-10-121-3/+4
| |/
* | Avoid std::count to skip a signed overflow warningJack Lloyd2017-10-122-3/+13
| | | | | | | | | | | | Couldn't figure out a way to silence this otherwise. Deprecate replace_char, erase_chars, replace_chars
* | Merge GH #1245 Restructure Barrier/Semaphore to avoid signed overflow warningsJack Lloyd2017-10-122-11/+9
|\ \ | |/ |/|
| * #1220 - fixed fixes of integer overflowHubert Bugaj2017-10-102-7/+3
| |
| * #1220 - fixed signed overflow warningsHubert Bugaj2017-10-092-10/+12
| |
* | Merge GH #1248 Unroll SM3 compression loopJack Lloyd2017-10-111-56/+94
|\ \
| * | Unroll SM3 compression functionJack Lloyd2017-10-101-56/+94
| | |
* | | Merge GH #1249 Add Eclipse configJack Lloyd2017-10-111-0/+167
|\ \ \
| * | | Add Eclipse code formatting template [ci skip]René Korthaus2017-10-111-0/+167
| | | |
* | | | Avoid <thread>Jack Lloyd2017-10-111-2/+6
| | | | | | | | | | | | | | | | Not needed here
* | | | Helpful commentJack Lloyd2017-10-111-1/+2
| | | |
* | | | Update test for new error returnJack Lloyd2017-10-111-1/+1
| | | |
* | | | Remove SSE2 bswap_4Jack Lloyd2017-10-111-24/+0
| | | | | | | | | | | | | | | | | | | | It was disabled anyway (bad macro check) and with recent GCC turned out to be slower than just using bswap.
* | | | Optimize CFB modeJack Lloyd2017-10-112-39/+97
| | | | | | | | | | | | | | | | Still slower but notably faster at least with AES-NI
* | | | Add missing headerJack Lloyd2017-10-111-0/+1
| | | | | | | | | | | | | | | | Error under filesystem-free builds
* | | | Deprecate anon DH/ECDH TLS ciphersuitesJack Lloyd2017-10-111-0/+2
| | | |