aboutsummaryrefslogtreecommitdiffstats
path: root/src/block
Commit message (Collapse)AuthorAgeFilesLines
* Anywhere where we use MemoryRegion::begin to get access to the raw pointerlloyd2010-09-132-6/+6
| | | | | representation (rather than in an interator context), instead use &buf[0], which works for both MemoryRegion and std::vector
* Big, invasive but mostly automated change, with a further attempt atlloyd2010-09-0726-47/+47
| | | | | | | | | | | | | | harmonising MemoryRegion with std::vector: The MemoryRegion::clear() function would zeroise the buffer, but keep the memory allocated and the size unchanged. This is very different from STL's clear(), which is basically the equivalent to what is called destroy() in MemoryRegion. So to be able to replace MemoryRegion with a std::vector, we have to rename destroy() to clear() and we have to expose the current functionality of clear() in some other way, since vector doesn't support this operation. Do so by adding a global function named zeroise() which takes a MemoryRegion which is zeroed. Remove clear() to ensure all callers are updated.
* Prevent shadowing of one loop param with anotherlloyd2010-09-071-1/+1
|
* The SSSE3 intrinsics apparently work under Sun Studio as welllloyd2010-09-071-0/+1
|
* Fix paper ref URL, remove unused prefetch includelloyd2010-08-201-5/+9
|
* Also use a smaller table in the first round of AES in the decrypt directionlloyd2010-08-191-9/+19
|
* In the first round of AES, use a 256 element table and do thelloyd2010-08-181-9/+28
| | | | | | | | | | | | | rotations in the code. This reduces the number of cache lines potentially accessed in the first round from 64 to 16 (assuming 64 byte cache lines). On average, about 10 cache lines will actually be accessed, assuming a uniform distribution of the inputs, so there definitely is still a timing channel here, just a somewhat smaller one. I experimented with using the 256 element table for all rounds but it reduced performance significantly and I'm not sure if the benefit is worth the cost or not.
* Add also AES-192 using SSSE3lloyd2010-08-122-23/+149
|
* Support AES-256 is the SSSE3 implementationlloyd2010-08-122-5/+93
|
* Use _mm_set_epi32 instead of _mm_set_epi64x - VC++ obnoxiously onlylloyd2010-08-112-79/+79
| | | | supports epi64x in 64-bit mode.
* Only enable aes_ssse3 when compiling with GCC or Clang. For some dumbasslloyd2010-08-091-0/+7
| | | | | | | | | | | | | | | reasons, Intel C++ rejects const __m128i foo = _mm_set_epi64x(...) though it will accept if you use one of the _mm_set1 variants. And Visual C++ doesn't know about _mm_set_epi64x() in 32-bit mode for similarly dumb reasons - it works fine compiling for 64 bit but for whatever reason they don't offer this function when compiling as 32 bit. Unfortunately there isn't a good way to specify it's OK with a particular compiler with one arch but not another, so just disable it globally for the time being. The workaround for VC++ is probably to use _mm_set_epi32 and break up the input values into 32 bit chunks. ICC is a lost cause I fear.
* Add an implementation of AES-128 using SSSE3 instructions. It runs inlloyd2010-08-093-0/+454
| | | | | | | | | | | | | | | constant time and on a Nehalem is significantly faster than the table based version. This implementation technique was invented by Mike Hamburg and described in a paper in CHES 2009 "Accelerating AES with Vector Permute Instructions". This code is basically a translation of his public domain x86-64 assembly code into intrinsics. Todo: Adding support for AES-192 and AES-256; this just requires implementing the key schedules. Currently only tested on an i7 with GCC (32 and 64 bit code); testing/optimization on 32-bit processors with SSSE3 like the Atom, and with Visual C++ and other compilers, are also todos.
* Also allow clang with 32-bit assembly code, everything seems to worklloyd2010-08-081-19/+0
| | | | fine with latest SVN.
* Only call the scalar versions if we actually have leftover blocks tolloyd2010-06-224-8/+16
| | | | process
* Doxygenlloyd2010-06-211-3/+26
|
* In IDEA, Noekeon, Serpent, XTEA, provide and use ro accessor functionslloyd2010-06-218-15/+43
| | | | | for getting access to the key schedule, instead of giving the key schedule protected status, which is much harder tu audit.
* Make Serpent's key_schedule and actual round keys private. Addlloyd2010-06-211-1/+15
| | | | | protected accessor functions for get and set. Set is needed by the x86 version since it implements the key schedule directly.
* Replace "@return a blah" and "@return the blah" with just "@return blah"lloyd2010-06-162-3/+3
|
* Yet more Doxygen commentslloyd2010-06-164-19/+28
|
* More Doxygenlloyd2010-06-154-3/+27
|
* Don't allow access to key_schedule, just the memberslloyd2010-06-152-3/+5
|
* More Doxygen updateslloyd2010-06-156-13/+42
|
* More Doxygen fixeslloyd2010-06-1528-40/+48
|
* Fix a few hundred Doxygen warningslloyd2010-06-151-2/+2
|
* Fix buildlloyd2010-06-071-0/+1
|
* Use "/*" instead of "/**" in starting comments at the begining of a file.lloyd2010-06-077-7/+6
| | | | | This caused Doxygen to think this was markup meant for it, which really caused some clutter in the namespace page.
* Hid --enable-isa and instead expose --enable-{sse2,ssse3,aes-ni,altivec}lloyd2010-05-261-1/+1
| | | | | | | | | | | | in the help. Unfortunately we can't just remove --enable-isa, because for the callback to work the target list has to already exist, and it only does by virtue of the default=[] param to the enable-isa setup. We could just use append_const, except then we can't run on Python 2.4, and the latest release of RHEL only has 2.4 :( Rename aes_ni to aes-ni in configuration-speak
* Change BlockCipher::parallelism() to return the native parallelism oflloyd2010-05-256-9/+17
| | | | | | | | | | | | | | | | | | | | the implementation rather than the preferred one. Update all implementations. Add a new function parallel_bytes() which returns parallelism() * BLOCK_SIZE * BUILD_TIME_CONSTANT This is because i noticed all current calls of parallelism() just multiplied the result by the block size already, so this simplified that code. The build time constant is set to 4, which was the previous default return value of parallelism(). However the SIMD versions returned 2*native paralellism rather than 4*, so this increases the buffer sizes used for those algorithms. The constant multiple lives in buildh.in and build.h, and is named BOTAN_BLOCK_CIPHER_PAR_MULT.
* Modify the implementation of multiplication mod 65537 used in IDEA tolloyd2010-04-301-10/+13
| | | | | | | | | | be branch-free. This reduces performance noticably on my Core2 (from 32 MiB/s to a bit over 27 MiB), but so it goes. The IDEA implementation using SSE2 is already branch-free here, and runs at about 135 MiB/s on my machine. Also add more IDEA tests, generated by OpenSSL
* Remove SecureBuffer, which is the fixed-size variant of SecureVector.lloyd2010-03-2338-51/+51
| | | | | | | | | | | | | | Add a second template param to SecureVector which specifies the initial length. Change all callers to be SecureVector instead of SecureBuffer. This can go away in C++0x, once compilers implement N2712 ("Non-static data member initializers"), and we can just write code as SecureVector<byte> P{18}; instead
* Set parallelism defaults.lloyd2010-02-256-1/+15
| | | | | | | | Default unless specified is now 4. For SIMD code, use 2x the number of blocks which are processed in parallel using SIMD by that cipher. It may make sense to increase this to 4x or even more, further experimentation is necessary.
* Instead of the mode parallelism being specified via macros, have itlloyd2010-02-251-0/+5
| | | | | | | | | depend on the particular implementation. Add a new virtual function to BlockCipher named parallelism that returns the number of blocks the cipher object could or might want to process in parallel. Currently set to 1 by default but may make sense to increase this for even scalar implementations since it seems like better caching behavior makes it a win.
* Add SIMD version of Noekeon. On a Core2, about 2.7x faster using SIMD_SSE2lloyd2010-01-124-1/+198
| | | | and 1.6x faster using SIMD_Scalar.
* Kill unneeded includelloyd2010-01-121-1/+0
|
* Add block cipher cascadelloyd2010-01-113-0/+148
|
* Clean up exceptions. Remove some unused ones like Config_Error. Makelloyd2010-01-051-1/+2
| | | | | | | Invalid_Argument just a typedef for std::invalid_argument. Make Botan::Exception a typedef for std::runtime_error. Make Memory_Exhaustion a public exception, and use it in other places where memory allocations can fail.
* Add doxygen commentslloyd2009-12-291-0/+13
|
* Add last nights project, an SSE2 implementation of IDEA. Right about 4xlloyd2009-12-235-53/+290
| | | | faster than the scalar version on a Core2.
* Un-internal loadstor.h (and its header deps, rotate.h andlloyd2009-12-2126-38/+41
| | | | | | | | | | | | | | bswap.h); too many external apps rely on loadstor.h existing. Define 64-bit generic bswap in terms of 32-bit bswap, since it's not much slower if 32-bit is also generic, and much faster if it's not. This may be quite helpful on 32-bit x86 in particular. Change formulation of generic 32-bit bswap. It may be faster or slower depending on the CPU, especially the latency and throuput of rotate instructions, but should be faster on an ideally superscalar processor with rotate instructions (ie, what I expect future CPUs to look more like).
* Add missing BOTAN_DLL exports.lloyd2009-12-161-3/+1
| | | | Move most of the engine headers to internal
* Make many more headers internal-only.lloyd2009-12-1629-41/+41
| | | | | | | | | | | | | Fixes for the amalgamation generator for internal headers. Remove BOTAN_DLL exporting macros from all internal-only headers; the classes/functions there don't need to be exported, and avoiding the PIC/GOT indirection can be a big win. Add missing BOTAN_DLLs where necessary, mostly gfpmath and cvc For GCC, use -fvisibility=hidden and set BOTAN_DLL to the visibility __attribute__ to export those classes/functions.
* Full working amalgamation build, plus internal-only headers concept.lloyd2009-12-166-11/+24
|
* Add missing header guards to package.h and botan.hlloyd2009-12-021-2/+2
| | | | | Change serp_simd_sbox.h's header guard to use the leading BOTAN_ prefix for proper macro namespacing.
* Remove obsolete commentlloyd2009-11-171-15/+0
|
* Rename/remove some secmem member variables for better matching with STLlloyd2009-11-172-3/+3
| | | | | | | | containers (specifically vector). Rename is_empty to empty Remove has_items Rename create to resize
* Instead of having two asm_macr.h files being switched in based on modulelloyd2009-11-141-1/+1
| | | | build magic, name them asm_macr_ARCH.h. Change all including files accordingly.
* Cleanups in the Square implementationlloyd2009-11-111-30/+38
|
* Double the speed of Skipjack on my Core2, mostly due to better inlining.lloyd2009-11-112-82/+99
|
* Inline all of the AES tables into an anon namespace in aes.cpp. Turns outlloyd2009-11-113-411/+399
| | | | to give a 3-7% speed improvement on Core2 with GCC.
* Almost double the speed of MARS; from 55 MiB/s to 102 on my Core2. lloyd2009-11-113-231/+216
|