aboutsummaryrefslogtreecommitdiffstats
path: root/src/block/idea_sse2
Commit message (Collapse)AuthorAgeFilesLines
* Give everything setting a feature test macro in build.h a version codelloyd2013-11-281-1/+1
| | | | | | so application code can check for the specific API it expects without having to keep track of what versions APIs x,y,z changed. Arbitrarily set all current API versions to 20131128.
* The new method of doing comparisons did not work all of the time: iflloyd2011-05-131-1/+3
| | | | | | | | | | | the low bytes were equal, then the saturating subtraction result in that byte would be 0 with the high byte containing a non-zero value. To deal with this, shift and or together the two values into the low byte. Add some new tests which check out the SIMD implementation more carefully, including values that trigger the problem in the earlier version.
* Fix the problem that prevented the SSE2 IDEA implementation fromlloyd2011-05-121-5/+1
| | | | | | | | | | working correctly under Clang - the technique for emulating unsigned compare relied on signed overflow. The new method does not, and works under GCC, ICC, and Clang. Even better, the compare takes only 2 instructions instead of 4. Prevent using any of the asm implementations under Clang on x86-32. All of them crash under Clang 2.9, unclear why.
* Maintainer mode warning cleanups, mostly for C style casts which Illoyd2011-04-181-8/+12
| | | | added to the flags here.
* Avoid VC cast warninglloyd2010-11-291-1/+1
|
* Add a new subclass for BlockCipher BlockCipher_Fixed_Block_Size, whichlloyd2010-10-131-4/+4
| | | | | | | | | | | | | | sets the block size statically and also creates an enum with the size. Use the enum instead of calling block_size() where possible, since that uses two virtual function calls per block which is quite unfortunate. The real advantages here as compared to the previous version which kept the block size as a per-object u32bit: - The compiler can inline the constant as an immediate operand (previously it would load the value via an indirection on this) - Removes 32 bits per object overhead (except in cases with actually variable block sizes, which are very few and rarely used)
* s/BLOCK_SIZE/block_size()/lloyd2010-10-131-4/+4
|
* Use size_t rather than u32bit for the blocks argument of encrypt_nlloyd2010-10-122-5/+5
|
* s/u32bit/size_t/ for block cipher parallelism querieslloyd2010-10-121-1/+1
|
* First set of changes for avoiding use implicit vector->pointer conversionslloyd2010-09-131-2/+6
|
* Only call the scalar versions if we actually have leftover blocks tolloyd2010-06-221-2/+4
| | | | process
* In IDEA, Noekeon, Serpent, XTEA, provide and use ro accessor functionslloyd2010-06-211-2/+2
| | | | | for getting access to the key schedule, instead of giving the key schedule protected status, which is much harder tu audit.
* More Doxygen fixeslloyd2010-06-151-1/+1
|
* Change BlockCipher::parallelism() to return the native parallelism oflloyd2010-05-251-1/+1
| | | | | | | | | | | | | | | | | | | | the implementation rather than the preferred one. Update all implementations. Add a new function parallel_bytes() which returns parallelism() * BLOCK_SIZE * BUILD_TIME_CONSTANT This is because i noticed all current calls of parallelism() just multiplied the result by the block size already, so this simplified that code. The build time constant is set to 4, which was the previous default return value of parallelism(). However the SIMD versions returned 2*native paralellism rather than 4*, so this increases the buffer sizes used for those algorithms. The constant multiple lives in buildh.in and build.h, and is named BOTAN_BLOCK_CIPHER_PAR_MULT.
* Set parallelism defaults.lloyd2010-02-251-0/+2
| | | | | | | | Default unless specified is now 4. For SIMD code, use 2x the number of blocks which are processed in parallel using SIMD by that cipher. It may make sense to increase this to 4x or even more, further experimentation is necessary.
* Kill unneeded includelloyd2010-01-121-1/+0
|
* Add last nights project, an SSE2 implementation of IDEA. Right about 4xlloyd2009-12-233-0/+263
faster than the scalar version on a Core2.