aboutsummaryrefslogtreecommitdiffstats
path: root/src/utils
Commit message (Collapse)AuthorAgeFilesLines
* Consolidate the non-canonical epoch timers, like cpuid and Win32'slloyd2009-12-013-1/+118
| | | | | | | | | | | | | | QueryPerformanceCounter, into an entropy source hres_timer. Its results, if any, do not count as contributing entropy to the poll. Convert the other (monotonic/fixed epoch) timers to a single function get_nanoseconds_clock(), living in time.h, which statically chooses the 'best' timer type (clock_gettime, gettimeofday, std::clock, in that order depending on what is available). Add feature test macros for clock_gettime and gettimeofday. Remove the Timer class and timer.h. Remove the Timer& argument to the algorithm benchmark function.
* Make sure the SIMD_32 implementation we're using actually works on thelloyd2009-11-243-3/+3
| | | | system before returning a new instance.
* Instead of having two asm_macr.h files being switched in based on modulelloyd2009-11-144-8/+0
| | | | build magic, name them asm_macr_ARCH.h. Change all including files accordingly.
* Use memcpy for bulk loads if algorithm endianness matches CPU endianess.lloyd2009-11-101-0/+9
|
* Also #undef bool after including <altivec.h>lloyd2009-11-101-0/+1
|
* Rename CPUID::has_intel_aes to has_aes_intel, and add CPUID::has_aes_via,lloyd2009-11-101-2/+11
| | | | which is currently just a stub returning false.
* Tick to 1.9.3-devlloyd2009-11-063-28/+15
| | | | | Rename BOTAN_UNALIGNED_LOADSTOR_OK to BOTAN_UNALIGNED_MEMORY_ACCESS_OK which is somewhat more clear as to the point.
* Add an andc operation, in SSE2 and AltiVec, may be useful for Serpent sboxeslloyd2009-11-044-4/+22
|
* Slight cleanups in the Altivec detection code for readability.lloyd2009-10-291-5/+12
|
* Add a new looping load_be / load_le for loading large arrays at once, andlloyd2009-10-291-0/+46
| | | | | | | | change some of the hash functions to use it as low hanging fruit. Probably could use further optimization (just unrolls x4 currently), but merely having it as syntax is good as it allows optimizing many functions at once (eg using SSE2 to do 4-way byteswaps).
* Fix cpuid with icc (tested with 11.1)lloyd2009-10-291-2/+2
| | | | | Document SHA optimizations, AltiVec runtime checking, fixes for cpuid for both icc and msvc.
* Give each version of SIMD_32 a public bswap()lloyd2009-10-293-11/+29
|
* Add new function enabled() to each of the SIMD_32 instantiations which lloyd2009-10-293-1/+9
| | | | | returns true if they might plausibly work. AltiVec and SSE2 versions call into CPUID, scalar version always works.
* No ||= operator!lloyd2009-10-291-7/+7
|
* Add CPUID::have_altivec for AltiVec runtime detection.lloyd2009-10-292-0/+61
| | | | | Relies on mfspr emulation/trapping by the kernel, which works on (at least) Linux and NetBSD.
* Use register writes in the Altivec code for stores because Altivec's handlinglloyd2009-10-291-7/+16
| | | | | | for unaligned writes is messy as hell. If writes are batched this is somewhat easier to deal with (somewhat).
* Kill realnames on new modules not in mailinelloyd2009-10-291-2/+0
|
* propagate from branch 'net.randombit.botan' (head ↵lloyd2009-10-295-0/+575
|\ | | | | | | | | | | 8fb69dd1c599ada1008c4cab2a6d502cbcc468e0) to branch 'net.randombit.botan.general-simd' (head c05c9a6d398659891fb8cca170ed514ea7e6476d)
| * Add copyright + license on the new SIMD fileslloyd2009-10-284-2/+14
| |
| * Add an AltiVec SIMD_32 implementation. Tested and works for Serpent and XTEAlloyd2009-10-281-0/+178
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | on a PowerPC 970 running Gentoo with GCC 4.3.4 Uses a GCC syntax for creating literal values instead of the Motorola syntax [{1,2,3,4} instead of (1,2,3,4)]. In tests so far, this is much, much slower than either the standard scalar code, or using the SIMD-in-scalar-registers code. It looks like for whatever reason GCC is refusing to inline the function: SIMD_Altivec(__vector unsigned int input) { reg = input; } and calls it with a branch hundreds of times in each function. I don't know if this is the entire reason it's slower, but it definitely can't be helping. The code handles unaligned loads OK but assumes stores are to an aligned address. This will fail drastically some day, and needs to be fixed to either use scalar stores, which (most?) PPCs will handle (if slowly), or batch the loads and stores so we can work across the loads. Considering the code so far loads 4 vectors of data in one go this would probably be a big win (and also for loads, since instead of doing 8 loads for 4 registers only 5 are needed).
| * Define SSE rotate_right in terms of rotate left, and load_be in termslloyd2009-10-281-3/+2
| | | | | | | | of load_le + bswap
| * Add subtraction operators to SIMD_32 classes, needed for XTEA decryptlloyd2009-10-262-0/+26
| |
| * Add a wrapper for a set of SSE2 operations with convenient syntax for 4x32lloyd2009-10-264-0/+360
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | operations. Also add a pure scalar code version. Convert Serpent to use this new interface, and add an implementation of XTEA in SIMD. The wrappers plus the scalar version allow SIMD-ish code to work on all platforms. This is often a win due to better ILP being visible to the processor (as with the recent XTEA optimizations). Only real danger is register starvation, mostly an issue on x86 these days. So it may (or may not) be a win to consolidate the standard C++ versions and the SIMD versions together. Future work: - Add AltiVec/VMX version - Maybe also for ARM's NEON extension? Less pressing, I would think. - Convert SHA-1 code to use SIMD_32 - Add XTEA SIMD decryption (currently only encrypt) - Change SSE2 engine to SIMD_engine - Modify configure.py to set BOTAN_TARGET_CPU_HAS_[SSE2|ALTIVEC|NEON|XXX] macros
* | Remove the 'realname' attribute on all modules and cc/cpu/os info files.lloyd2009-10-295-10/+0
|/ | | | | Pretty much useless and unused, except for listing the module names in build.h and the short versions totally suffice for that.
* Add ; after call to VC++'s __cpuid, not a macrolloyd2009-10-251-1/+1
|
* Cast the u32bit output array to an int* when calling the VC++ intrinsic,lloyd2009-10-251-3/+6
| | | | | | | since it passes signed ints for whatever reason. Ensure CALL_CPUID is always defined (previously, it would not be if on an x86 but compiled with something other than GCC, ICC, VC++).
* Add new store_[l|b]e variants taking 8 values.lloyd2009-10-231-16/+108
| | | | | | Add new load options that are passed a number of variables by reference, setting them all at once. Will allow for batching operations (eg using SIMD operations to do 128-bit wide bswaps) for future optimizations.
* Enable CPUID on x86 (checking wrong macro name)lloyd2009-10-211-1/+1
|
* In to_u32bit, ignore space characters in inputlloyd2009-10-061-0/+3
|
* Clean up cpuid callinglloyd2009-10-061-32/+26
|
* Disable prefetch in AES for now. Problem: with iterative modes like CBC,lloyd2009-09-301-12/+0
| | | | | | | | the prefetch is called for each block of input, and so a total of (4096+256)/64 = 68 prefetches are executed for each block. This reduces performance of iterative modes dramatically. I'm not sure what the right approach for dealing with this is.
* Add cpuid check for Intel AESlloyd2009-09-301-1/+8
|
* Add vendor ID for AMDlloyd2009-09-291-1/+1
|
* Significantly rework CPUID support. Add cache line detectionlloyd2009-09-292-87/+99
|
* Change the prefetching interface; move to PREFETCH namespace, and add alloyd2009-09-291-9/+25
| | | | helper function for fetching both inputs and outputs of block ciphers.
* Remove add block from utils/info.txtlloyd2009-09-291-27/+0
|
* Add some basic prefetching support (only supported with GNU C++ or thingslloyd2009-09-294-3/+44
| | | | | that claim to be by defining __GNUG__ (such as Intel C++)) in new utils header prefetch.h
* merge of '1efb42e84eca9e01edd7b7f1335af7011eab994c'lloyd2009-09-253-0/+161
|\ | | | | | | and 'bb55abb64b64ca63aeb361db40c6bc4692d4ce48'
| * Add runtime cpuid support. Check in the SSE2 engine that SSE2 is actuallylloyd2009-09-253-0/+161
| | | | | | | | existing on the current CPU before returning an object.
* | Add engine deps on the asm_xxx modules so the engines get loadedlloyd2009-09-242-0/+8
| |
* | propagate from branch 'net.randombit.botan.1_8' (head ↵lloyd2009-09-176-110/+70
|\| | | | | | | | | | | 1f4729658b70a340064bc9a33c923a44ecab84d8) to branch 'net.randombit.botan' (head b9ca6596a127964cb9795d22bc2a5642fab5de84)
| * Split up util.h into 3 fileslloyd2009-09-173-65/+4
| | | | | | | | | | | | | | - rounding.h (round_up, round_down) - workfactor.h (dl_work_factor) - timer.h (system_time) And update all users of the previous util.h
| * Move memory locking function decls to mlock.hlloyd2009-09-175-25/+46
| | | | | | | | Inline round_up and round_down
* | Fix macro generation + checks in configure.py and bswap.h. Had the effectlloyd2009-09-171-2/+2
|/ | | | of preventing the bswap optimizations from being used. :(
* Add a new option --no-autoload to configure.py. This will produce a minimallloyd2009-09-041-1/+1
| | | | | | | | | | | build (only libstate, utils, plus dependencies), which can be extended with use of --enable-modules. To add new modules to the set of always-loaded, use 'load_on always' in info.txt Also fix a few small build problems that popped up when doing a minimal build. Requested by a user.
* Fix variable name in 32-bit bswap for VC++lloyd2009-08-031-1/+1
|
* A typo in a macro check in bswap.h meant inline asm bswap was notlloyd2009-07-311-1/+1
| | | | used on Visual C++
* Add missing info.txt fileslloyd2009-07-162-0/+24
|
* Correct source listings for moved fileslloyd2009-07-161-2/+0
|
* Move some files around to break up dependencies between directorieslloyd2009-07-169-958/+0
|