botan.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	Generate SIMD macro flags for build.h from data in build-data/arch for	lloyd	2009-11-06	6	-6/+70
\| \| \| \| \| \|	SSE2, SSSE3, NEON, and AltiVec. Add entries for Intel Atom, POWER6 and POWER7, and the Cortex A8 and A9.
*	Add an andc operation, in SSE2 and AltiVec, may be useful for Serpent sboxes	lloyd	2009-11-04	4	-4/+22
\|
*	Set BOTAN_TARGET_CPU_HAS_SSE2 macro if amd64. Not set at all for any 32-bit	lloyd	2009-11-04	1	-0/+3
\| \| \| \| \|	x86 currently. This should be fixed. But it's an improvement over having to always set it manually, at least.
*	Indent and avoid one extra assignment	lloyd	2009-11-04	1	-3/+2
\|
*	propagate from branch 'net.randombit.botan.1_8' (head ↵	lloyd	2009-11-03	559	-6939/+13364
\|\ \| \| \| \| \| \| \| \| \| \|	6e8c18515725a70923b34118951252723dd4c29a) to branch 'net.randombit.botan' (head 77ba4ea5a4be36d6d029bcc852b2271edff0d679)
\| *	propagate from branch 'net.randombit.botan.1_8' (head ↵1.9.2	lloyd	2009-11-03	2	-2/+3
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	a101c8c86b755a666c72baf03154230e09e0667e) to branch 'net.randombit.botan' (head 948905e3872b6f5904686533c6aa87d38ff90a71)
\| * \|	Update for 1.9.2 release 2009-11-03	lloyd	2009-11-03	4	-11/+5
\| \| \|
\| * \|	Conver the rest of the hash functions to use the array-based load instructions.	lloyd	2009-11-03	5	-40/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm not totally happy with this - in particular in all cases the size is a compile time constant - it would be nice to make use of this via tempalate metaprogramming. Also for matching endian loads, a straight memcpy would do the work, which would probably be even faster.
\| * \|	Slight cleanups in the Altivec detection code for readability.	lloyd	2009-10-29	1	-5/+12
\| \| \|
\| * \|	Add a new looping load_be / load_le for loading large arrays at once, and	lloyd	2009-10-29	11	-49/+104
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	change some of the hash functions to use it as low hanging fruit. Probably could use further optimization (just unrolls x4 currently), but merely having it as syntax is good as it allows optimizing many functions at once (eg using SSE2 to do 4-way byteswaps).
\| * \|	Fix cpuid with icc (tested with 11.1)	lloyd	2009-10-29	2	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Document SHA optimizations, AltiVec runtime checking, fixes for cpuid for both icc and msvc.
\| * \|	propagate from branch 'net.randombit.botan' (head ↵	lloyd	2009-10-29	30	-964/+1723
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	4fd7eb9630271d3c1dfed21987ef864680d4ce7b) to branch 'net.randombit.botan.general-simd' (head 91df868149cdc4754d340e6103028acc82182609)
\| \| * \|	Clean up prep00_15 - same speed on Core2	lloyd	2009-10-29	1	-16/+10
\| \| \| \|
\| \| * \|	Clean up the SSE2 SHA-1 code quite a bit, make better use of C++ features	lloyd	2009-10-29	2	-308/+267
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	and also make it stylistically much closer to the standard SHA-1 code.
\| \| * \|	Format for easier reading	lloyd	2009-10-29	1	-31/+43
\| \| \| \|
\| \| * \|	Small cleanups (remove tab characters, change macros to fit the rest of	lloyd	2009-10-29	1	-123/+121
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the code stylistically, etc)
\| \| * \|	Give each version of SIMD_32 a public bswap()	lloyd	2009-10-29	3	-11/+29
\| \| \| \|
\| \| * \|	Add new function enabled() to each of the SIMD_32 instantiations which	lloyd	2009-10-29	3	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	returns true if they might plausibly work. AltiVec and SSE2 versions call into CPUID, scalar version always works.
\| \| * \|	No \|\|= operator!	lloyd	2009-10-29	1	-7/+7
\| \| \| \|
\| \| * \|	Add CPUID::have_altivec for AltiVec runtime detection.	lloyd	2009-10-29	3	-0/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Relies on mfspr emulation/trapping by the kernel, which works on (at least) Linux and NetBSD.
\| \| * \|	Rename sse2 engine to simd	lloyd	2009-10-29	2	-2/+2
\| \| \| \|
\| \| * \|	Use register writes in the Altivec code for stores because Altivec's handling	lloyd	2009-10-29	1	-7/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	for unaligned writes is messy as hell. If writes are batched this is somewhat easier to deal with (somewhat).
\| \| * \|	Kill realnames on new modules not in mailine	lloyd	2009-10-29	3	-5/+0
\| \| \| \|
\| \| * \|	propagate from branch 'net.randombit.botan' (head ↵	lloyd	2009-10-29	23	-621/+1324
\| \| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	54d2cc7b00ecd5f41295e147d23ab6d294309f61) to branch 'net.randombit.botan.general-simd' (head 9cb1b5f00bfefd05cd9555489db34e6d86867aca)
\| \| \| * \	propagate from branch 'net.randombit.botan' (head ↵	lloyd	2009-10-29	23	-621/+1324
\| \| \| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	8fb69dd1c599ada1008c4cab2a6d502cbcc468e0) to branch 'net.randombit.botan.general-simd' (head c05c9a6d398659891fb8cca170ed514ea7e6476d)
\| \| \| \| * \|	Rename SSE2 stuff to be generally SIMD since it supports at least SSE2	lloyd	2009-10-29	16	-135/+126
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	and Altivec (though Altivec is seemingly slower ATM...)
\| \| \| \| * \|	Add copyright + license on the new SIMD files	lloyd	2009-10-28	4	-2/+14
\| \| \| \| \| \|
\| \| \| \| * \|	Document SIMD changes	lloyd	2009-10-28	1	-0/+2
\| \| \| \| \| \|
\| \| \| \| * \|	propagate from branch 'net.randombit.botan' (head ↵	lloyd	2009-10-28	12	-404/+1101
\| \| \| \| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	bf629b13dd132b263e76a72b7eca0f7e4ab19aac) to branch 'net.randombit.botan.general-simd' (head f731cff08ff0d04c062742c0c6cfcc18856400ea)
\| \| \| \| \| * \|	Add an AltiVec SIMD_32 implementation. Tested and works for Serpent and XTEA	lloyd	2009-10-28	1	-0/+178
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	on a PowerPC 970 running Gentoo with GCC 4.3.4 Uses a GCC syntax for creating literal values instead of the Motorola syntax [{1,2,3,4} instead of (1,2,3,4)]. In tests so far, this is much, much slower than either the standard scalar code, or using the SIMD-in-scalar-registers code. It looks like for whatever reason GCC is refusing to inline the function: SIMD_Altivec(__vector unsigned int input) { reg = input; } and calls it with a branch hundreds of times in each function. I don't know if this is the entire reason it's slower, but it definitely can't be helping. The code handles unaligned loads OK but assumes stores are to an aligned address. This will fail drastically some day, and needs to be fixed to either use scalar stores, which (most?) PPCs will handle (if slowly), or batch the loads and stores so we can work across the loads. Considering the code so far loads 4 vectors of data in one go this would probably be a big win (and also for loads, since instead of doing 8 loads for 4 registers only 5 are needed).
\| \| \| \| \| * \|	Define SSE rotate_right in terms of rotate left, and load_be in terms	lloyd	2009-10-28	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	of load_le + bswap
\| \| \| \| \| * \|	Add XTEA decryption	lloyd	2009-10-26	1	-11/+47
\| \| \| \| \| \| \|
\| \| \| \| \| * \|	Add subtraction operators to SIMD_32 classes, needed for XTEA decrypt	lloyd	2009-10-26	2	-0/+26
\| \| \| \| \| \| \|
\| \| \| \| \| * \|	Add a wrapper for a set of SSE2 operations with convenient syntax for 4x32	lloyd	2009-10-26	11	-404/+862
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	operations. Also add a pure scalar code version. Convert Serpent to use this new interface, and add an implementation of XTEA in SIMD. The wrappers plus the scalar version allow SIMD-ish code to work on all platforms. This is often a win due to better ILP being visible to the processor (as with the recent XTEA optimizations). Only real danger is register starvation, mostly an issue on x86 these days. So it may (or may not) be a win to consolidate the standard C++ versions and the SIMD versions together. Future work: - Add AltiVec/VMX version - Maybe also for ARM's NEON extension? Less pressing, I would think. - Convert SHA-1 code to use SIMD_32 - Add XTEA SIMD decryption (currently only encrypt) - Change SSE2 engine to SIMD_engine - Modify configure.py to set BOTAN_TARGET_CPU_HAS_[SSE2\|ALTIVEC\|NEON\|XXX] macros
\| * \| \| \| \| \|	Unroll SHA-1's expansion loop from x4 to x8; ~7% faster on Core2	lloyd	2009-10-29	1	-1/+5
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Unroll the expansion loop in both SHA-2 implementations by 8. On a Core2,	lloyd	2009-10-29	2	-13/+29
\| \|/ / / / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SHA-256 gets ~7% faster, SHA-512 ~10%.
\| * / / / /	Kill straggling realnames	lloyd	2009-10-29	2	-4/+0
\| \|/ / / /
\| * \| \| \|	Hurd file was missing txt extension, must have missed it before?	lloyd	2009-10-29	1	-0/+0
\| \| \| \| \|
\| * \| \| \|	Remove the 'realname' attribute on all modules and cc/cpu/os info files.	lloyd	2009-10-29	234	-479/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pretty much useless and unused, except for listing the module names in build.h and the short versions totally suffice for that.
\| * \| \| \|	propagate from branch 'net.randombit.botan.1_8' (head ↵	lloyd	2009-10-28	393	-6047/+12111
\| \|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	3158f8272a3582dd44dfb771665eb71f7d005339) to branch 'net.randombit.botan' (head bf629b13dd132b263e76a72b7eca0f7e4ab19aac)
\| \| * \| \|	Add missing log note for 1.9.1 change notes on CTR/OFB change	lloyd	2009-10-28	1	-0/+1
\| \| \| \| \|
\| \| * \| \|	Indent fix	lloyd	2009-10-26	1	-1/+1
\| \| \|/ /
\| \| * \|	Tick version to 1.9.2-dev	lloyd	2009-10-26	3	-4/+6
\| \| \| \|
\| \| * \|	Small cleanups	lloyd	2009-10-26	1	-4/+3
\| \| \| \|
\| \| * \|	Add ; after call to VC++'s __cpuid, not a macro	lloyd	2009-10-25	2	-7/+14
\| \| \| \|
\| \| * \|	Cast the u32bit output array to an int* when calling the VC++ intrinsic,	lloyd	2009-10-25	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	since it passes signed ints for whatever reason. Ensure CALL_CPUID is always defined (previously, it would not be if on an x86 but compiled with something other than GCC, ICC, VC++).
\| \| * \|	Update docs for 1.9.1 release 2009-10-231.9.1	lloyd	2009-10-23	3	-3/+4
\| \| \| \|
\| \| * \|	Kill stdio include	lloyd	2009-10-23	1	-2/+0
\| \| \| \|
\| \| * \|	Use new load/store ops in xtea x4 code	lloyd	2009-10-23	1	-12/+6
\| \| \| \|
\| \| * \|	Add new store_[l\|b]e variants taking 8 values.	lloyd	2009-10-23	1	-16/+108
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add new load options that are passed a number of variables by reference, setting them all at once. Will allow for batching operations (eg using SIMD operations to do 128-bit wide bswaps) for future optimizations.