botan.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	Further optimizations, and split out GHASH reduction code	Jack Lloyd	2017-10-18	3	-87/+57
\|
*	GCM and CTR optimizations	Jack Lloyd	2017-10-18	11	-372/+624
\| \| \| \| \| \| \| \| \| \| \|	In CTR, special case for counter widths of special interest. In GHASH, uses a 4x reduction technique suggested by Intel. Split out GHASH to its own source file and header. With these changes GCM is over twice as fast on Skylake and about 50% faster on Westmere.
*	Correct usage of std::aligned_storage	Jack Lloyd	2017-10-15	1	-6/+6
\| \| \| \|	This ended up allocating 256 KiB!
*	Additional final annotations	Jack Lloyd	2017-10-15	19	-27/+26
\|
*	GMAC optimization	Jack Lloyd	2017-10-15	2	-21/+32
\| \| \| \| \|	Avoid copying inputs needlessly, on Skylake doubles performance (from 1 GB/s -> 2 GB/s)
*	Merge GH #1257 Use std::aligned_storage for AES T-table	Jack Lloyd	2017-10-15	1	-32/+56
\|\
\| *	Use overaligned storage for AES T-Table	Jack Lloyd	2017-10-14	1	-32/+56
\| \| \| \| \| \| \| \| \| \|	This improves performance by ~ .5 cycle/byte. Also it ensures that our cache reading countermeasure works as expected.
* \|	Merge GH #1255 Use a single T-table in AES	Jack Lloyd	2017-10-15	1	-127/+78
\|\\|
\| *	Reduce AES to using a single T-table	Jack Lloyd	2017-10-13	1	-127/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Should have significantly better cache characteristics, though it would be nice to verify this. It reduces performance somewhat but less than I expected, at least on Skylake. I need to check this across more platforms to make sure t won't hurt too badly.
* \|	De-inline bodies of exception classes	Jack Lloyd	2017-10-15	3	-67/+133
\|/ \| \| \| \| \| \| \| \|	This leads to a rather shocking decrease in binary sizes, especially the static library (~1.5 MB reduction). Saves 60KB in the shared lib. Since throwing or catching an exception is relatively expensive these not being inlined is not a problem in that sense. It had simply not occured to me that it would take up so much extra space in the binary.
*	Optimizations for SM4	Jack Lloyd	2017-10-13	1	-35/+94
\| \| \| \| \| \| \| \| \|	Using a larger table helps quite a bit. Using 4 tables (ala AES T-tables) didn't seem to help much at all, it's only slightly faster than a single table with rotations. Continue to use the 8 bit table in the first and last rounds as a countermeasure against cache attacks.
*	Accept SHA-1, SHA1, or SHA-160 equally	Jack Lloyd	2017-10-13	3	-3/+3
\| \| \| \| \| \|	Fixes #1235 [ci skip]
*	Further GCM optimizations	Jack Lloyd	2017-10-13	1	-17/+27
\| \| \| \|	Went from 27 to 20 cycles per byte on Skylake (with clmul disabled)
*	Merge GH #1253 GCM optimizations	Jack Lloyd	2017-10-13	8	-174/+242
\|\
\| *	Optimize GCM	Jack Lloyd	2017-10-13	8	-174/+242
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	By allowing multiple blocks for clmul, slight speedup there though still far behind optimum. Precompute a table of multiples of H, 3-4x faster on systems without clmul (and still no secret indexes). Refactor GMAC to not derive from GHASH
* \|	Merge GH #1254 Add missing include	Jack Lloyd	2017-10-13	1	-0/+1
\|\ \
\| * \|	Add limits.h header for INT_MAX	Alon Bar-Lev	2017-10-13	1	-0/+1
\| \|/ \| \| \| \| \| \| \| \|	Gentoo-Bug: https://bugs.gentoo.org/633468 Signed-off-by: Alon Bar-Lev <[email protected]>
* /	Use memcpy trick in 3-arg xor_buf also	Jack Lloyd	2017-10-13	1	-23/+17
\|/
*	OCB optimizations	Jack Lloyd	2017-10-13	2	-58/+90
\| \| \| \| \| \|	With fast AES-NI, gets down to about 2 cycles per byte which is pretty good compared to the ~5.5 cpb of 2.3, still a long way off the best stiched impls which run at ~0.6 cpb.
*	Somewhat faster xor_buf	Jack Lloyd	2017-10-12	1	-18/+15
\| \| \| \|	Avoids the cast alignment problems of yesteryear
*	Remove needless mutable	Jack Lloyd	2017-10-12	1	-2/+2
\| \| \| \|	[ci skip]
*	Swapped encrypt and decrypt in BlockCipher _xex functions	Jack Lloyd	2017-10-12	1	-2/+2
\| \| \| \| \|	Missed by everything but the OCB wide tests because most ciphers have fixed width and get the override.
*	Interleave SM3 message expansion	Jack Lloyd	2017-10-12	1	-141/+142
\| \| \| \|	Reduces stack usage and a bit faster
*	Use SIMD for in Threefish	Jack Lloyd	2017-10-12	1	-2/+2
\| \| \| \|	GCC 7 can actually vectorize this for AVX2
*	OCB optimizations	Jack Lloyd	2017-10-12	7	-124/+163
\| \| \| \|	From ~5 cbp to ~2.5 cbp on Skylake
*	Merge GH #1247 Improve bit rotation functions	Jack Lloyd	2017-10-12	35	-644/+724
\|\
\| *	Ugh, the GCC/Clang trick triggers C4146 under MSVC	Jack Lloyd	2017-10-12	1	-8/+25
\| \| \| \| \| \| \| \| \| \| \| \|	And rotate.h is a visible header. Blerg. Inline asm it is.
\| *	Add compile-time rotation functions	Jack Lloyd	2017-10-12	35	-660/+701
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The problem with asm rol/ror is the compiler can't schedule effectively. But we only need asm in the case when the rotation is variable, so distinguish the two cases. If a compile time constant, then static_assert that the rotation is in the correct range and do the straightforward expression knowing the compiler will probably do the right thing. Otherwise do a tricky expression that both GCC and Clang happen to have recognize. Avoid the reduction case; instead require that the rotation be in range (this reverts 2b37c13dcf). Remove the asm rotations (making this branch illnamed), because now both Clang and GCC will create a roll without any extra help. Remove the reduction/mask by the word size for the variable case. The compiler can't optimize that it out well, but it's easy to ensure it is valid in the callers, especially now that the variable input cases are easy to grep for.
\| *	Use rol/ror x86 instructions on GCC/Clang	Jack Lloyd	2017-10-11	1	-2/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Neither is very good at recognizing rotate sequences. For cases where the rotation value is a constant they do fine, but for variable rotations they do horribly. Using inline asm here improved performance of both CAST-128 and CAST-256 by ~20% on my system with both GCC and Clang.
* \|	Avoid std::count to skip a signed overflow warning	Jack Lloyd	2017-10-12	2	-3/+13
\| \| \| \| \| \| \| \| \| \| \| \|	Couldn't figure out a way to silence this otherwise. Deprecate replace_char, erase_chars, replace_chars
* \|	Merge GH #1245 Restructure Barrier/Semaphore to avoid signed overflow warnings	Jack Lloyd	2017-10-12	2	-11/+9
\|\ \ \| \|/ \|/\|
\| *	#1220 - fixed fixes of integer overflow	Hubert Bugaj	2017-10-10	2	-7/+3
\| \|
\| *	#1220 - fixed signed overflow warnings	Hubert Bugaj	2017-10-09	2	-10/+12
\| \|
* \|	Merge GH #1248 Unroll SM3 compression loop	Jack Lloyd	2017-10-11	1	-56/+94
\|\ \
\| * \|	Unroll SM3 compression function	Jack Lloyd	2017-10-10	1	-56/+94
\| \| \|
* \| \|	Helpful comment	Jack Lloyd	2017-10-11	1	-1/+2
\| \| \|
* \| \|	Remove SSE2 bswap_4	Jack Lloyd	2017-10-11	1	-24/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It was disabled anyway (bad macro check) and with recent GCC turned out to be slower than just using bswap.
* \| \|	Optimize CFB mode	Jack Lloyd	2017-10-11	2	-39/+97
\| \| \| \| \| \| \| \| \| \| \| \|	Still slower but notably faster at least with AES-NI
* \| \|	Add missing header	Jack Lloyd	2017-10-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Error under filesystem-free builds
* \| \|	Simplify ffi call overhead	Jack Lloyd	2017-10-11	5	-45/+22
\| \| \| \| \| \| \| \| \| \| \| \|	Notable reductions in code size, stack size and function call overhead.
* \| \|	getenv is in standard C++	Jack Lloyd	2017-10-09	1	-1/+1
\| \| \|
* \| \|	Include cstdlib to make os_utils compile with clang.	Alexander Bluhm	2017-10-09	1	-0/+2
\| \|/ \|/\|
* \|	Add comments explaining why its ok to rely on deprecated features here.	Jack Lloyd	2017-10-09	2	-0/+8
\| \| \| \| \| \| \| \|	[ci skip]
* \|	Add a special Compat_Callbacks constructor to silence deprecation warnings.	Jack Lloyd	2017-10-09	3	-7/+24
\| \| \| \| \| \| \| \| \| \| \| \|	That way we avoid the warning internally even in amalgamation mode. GH #1243
* \|	Forward declare BigInt in mp_core.h	Jack Lloyd	2017-10-06	2	-1/+3
\| \| \| \| \| \| \| \|	Only needed in one source file here.
* \|	Remove needless variable	Jack Lloyd	2017-10-06	1	-2/+0
\| \|
* \|	Address some bool/int conversion warnings from Sonar	Jack Lloyd	2017-10-06	4	-5/+12
\| \| \| \| \| \| \| \|	Nothing major but probably good to clean these up.
* \|	Address various GCC warnings	Jack Lloyd	2017-10-06	8	-24/+26
\| \| \| \| \| \| \| \| \| \|	Things like -Wconversion and -Wuseless-cast that are noisy and not on by default.
* \|	Correct the SHA-3 PKCSv1.5 IDs	Jack Lloyd	2017-10-05	2	-5/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Thanks to @noloader for pointing me at draft-jivsov-openpgp-sha3-01 which has the correct values. Adds a test so this can't happen again.
* \|	Mark some functions of MDx_HashFunction final	Jack Lloyd	2017-10-05	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	The class itself can't be final but we can final the overrides from HashFunction, which helps the compiler devirtualize.