aboutsummaryrefslogtreecommitdiffstats
path: root/src/block/aes
Commit message (Collapse)AuthorAgeFilesLines
* Only allocate as much working space as needed in key schedulelloyd2010-10-141-4/+4
|
* Remove standalone S functionlloyd2010-10-141-12/+13
|
* In all cases where the block size of the cipher is fixed, the keylloyd2010-10-142-65/+115
| | | | | | | | | | | | | | | | parameters are as well. So make them template paramters. The sole exception was AES, because you could either initialize AES with a fixed key length, in which case it would only be that specific key length, or not, in which case it would support any valid AES key size. This is removed in this checkin; you have to specifically ask for AES-128, AES-192, or AES-256, depending on which one you want. This is probably actually a good thing, because every implementation other than the base one (SSSE3, AES-NI, OpenSSL) did not support "AES", only the versions with specific fixed key sizes. So forcing the user to ask for the one they want ensures they get the ones that are faster and/or safer.
* Make the rounds implicit with the size of the key valueslloyd2010-10-132-55/+42
|
* More size_t. Document changeslloyd2010-10-132-2/+2
|
* Add a new subclass for BlockCipher BlockCipher_Fixed_Block_Size, whichlloyd2010-10-132-8/+10
| | | | | | | | | | | | | | sets the block size statically and also creates an enum with the size. Use the enum instead of calling block_size() where possible, since that uses two virtual function calls per block which is quite unfortunate. The real advantages here as compared to the previous version which kept the block size as a per-object u32bit: - The compiler can inline the constant as an immediate operand (previously it would load the value via an indirection on this) - Removes 32 bits per object overhead (except in cases with actually variable block sizes, which are very few and rarely used)
* s/BLOCK_SIZE/block_size()/lloyd2010-10-131-4/+4
|
* Use size_t rather than u32bit in SymmetricAlgorithmlloyd2010-10-132-26/+27
|
* Use size_t rather than u32bit for the blocks argument of encrypt_nlloyd2010-10-122-6/+6
|
* Completely remove the second parameter to SecureVector which specifieslloyd2010-09-142-7/+8
| | | | | | | | | | | | | | | | | | | | the initial/default length of the array, update all users to instead pass the value to the constructor. This is a old vestigal thing from a class (SecureBuffer) that used this compile-time constant in order to store the values in an array. However this was changed way back in 2002 to use the same allocator hooks as the rest of the containers, so the only advantage to using the length field was that the initial length was set and didn't have to be set in the constructor which was midly convenient. However this directly conflicts with the desire to be able to (eventually) use std::vector with a custom allocator, since of course vector doesn't support this. Fortunately almost all of the uses are in classes which have only a single constructor, so there is little to no duplication by instead initializing the size in the constructor.
* First set of changes for avoiding use implicit vector->pointer conversionslloyd2010-09-131-4/+4
|
* Big, invasive but mostly automated change, with a further attempt atlloyd2010-09-071-4/+4
| | | | | | | | | | | | | | harmonising MemoryRegion with std::vector: The MemoryRegion::clear() function would zeroise the buffer, but keep the memory allocated and the size unchanged. This is very different from STL's clear(), which is basically the equivalent to what is called destroy() in MemoryRegion. So to be able to replace MemoryRegion with a std::vector, we have to rename destroy() to clear() and we have to expose the current functionality of clear() in some other way, since vector doesn't support this operation. Do so by adding a global function named zeroise() which takes a MemoryRegion which is zeroed. Remove clear() to ensure all callers are updated.
* Fix paper ref URL, remove unused prefetch includelloyd2010-08-201-5/+9
|
* Also use a smaller table in the first round of AES in the decrypt directionlloyd2010-08-191-9/+19
|
* In the first round of AES, use a 256 element table and do thelloyd2010-08-181-9/+28
| | | | | | | | | | | | | rotations in the code. This reduces the number of cache lines potentially accessed in the first round from 64 to 16 (assuming 64 byte cache lines). On average, about 10 cache lines will actually be accessed, assuming a uniform distribution of the inputs, so there definitely is still a timing channel here, just a somewhat smaller one. I experimented with using the 256 element table for all rounds but it reduced performance significantly and I'm not sure if the benefit is worth the cost or not.
* Yet more Doxygen commentslloyd2010-06-162-7/+12
|
* Use "/*" instead of "/**" in starting comments at the begining of a file.lloyd2010-06-072-2/+2
| | | | | This caused Doxygen to think this was markup meant for it, which really caused some clutter in the namespace page.
* Remove SecureBuffer, which is the fixed-size variant of SecureVector.lloyd2010-03-232-5/+5
| | | | | | | | | | | | | | Add a second template param to SecureVector which specifies the initial length. Change all callers to be SecureVector instead of SecureBuffer. This can go away in C++0x, once compilers implement N2712 ("Non-static data member initializers"), and we can just write code as SecureVector<byte> P{18}; instead
* Un-internal loadstor.h (and its header deps, rotate.h andlloyd2009-12-211-1/+2
| | | | | | | | | | | | | | bswap.h); too many external apps rely on loadstor.h existing. Define 64-bit generic bswap in terms of 32-bit bswap, since it's not much slower if 32-bit is also generic, and much faster if it's not. This may be quite helpful on 32-bit x86 in particular. Change formulation of generic 32-bit bswap. It may be faster or slower depending on the CPU, especially the latency and throuput of rotate instructions, but should be faster on an ideally superscalar processor with rotate instructions (ie, what I expect future CPUs to look more like).
* Make many more headers internal-only.lloyd2009-12-161-1/+1
| | | | | | | | | | | | | Fixes for the amalgamation generator for internal headers. Remove BOTAN_DLL exporting macros from all internal-only headers; the classes/functions there don't need to be exported, and avoiding the PIC/GOT indirection can be a big win. Add missing BOTAN_DLLs where necessary, mostly gfpmath and cvc For GCC, use -fvisibility=hidden and set BOTAN_DLL to the visibility __attribute__ to export those classes/functions.
* Inline all of the AES tables into an anon namespace in aes.cpp. Turns outlloyd2009-11-113-411/+399
| | | | to give a 3-7% speed improvement on Core2 with GCC.
* Remove the 'realname' attribute on all modules and cc/cpu/os info files.lloyd2009-10-291-2/+0
| | | | | Pretty much useless and unused, except for listing the module names in build.h and the short versions totally suffice for that.
* Remove all exception specifications. The way these are designed in C++ islloyd2009-10-222-2/+2
| | | | | | just too fragile and not that useful. Something like Java's checked exceptions might be nice, but simply killing the process entirely if an unexpected exception is thrown is not exactly useful for something trying to be robust.
* Disable prefetch in AES for now. Problem: with iterative modes like CBC,lloyd2009-09-301-8/+0
| | | | | | | | the prefetch is called for each block of input, and so a total of (4096+256)/64 = 68 prefetches are executed for each block. This reduces performance of iterative modes dramatically. I'm not sure what the right approach for dealing with this is.
* Use prefetching in AES. Nominally, this will help somewhat with preventinglloyd2009-09-291-0/+8
| | | | | | | | | | timing attacks, since once all the TE/SE tables are entirely in cache then timing attacks against it become somewhat harder. However for this to be a full defense it would be necessary to ensure the tables were entirely loaded into cache, which is not guaranteed by the normal SSE prefetch instructions. (Or prefetch instructions for other CPUs, AFAIK). Much more importantly, it provides a 10% speedup.
* Remove add blocks from block cipher info fileslloyd2009-09-291-8/+0
|
* Make encrypt_n public for all BlockCipher implementations - unlike thelloyd2009-08-111-2/+4
| | | | | | enc/dec functions it replaces, these are public interfaces. Add the first bits of a SSE2 implementation of Serpent. Currently incomplete.
* Change the BlockCipher interface to support multi-block encryption andlloyd2009-08-112-130/+142
| | | | | | | | | decryption. Currently only used for counter mode. Doesn't offer much advantage as-is (though might help slightly, in terms of cache effects), but allows for SIMD implementations to process multiple blocks in parallel when possible. Particularly thinking here of Serpent; TEA/XTEA also seem promising in this sense, as is Threefish once that is implemented as a standalone block cipher.
* Thomas Moschny passed along a request from the Fedora packagers which camelloyd2009-03-303-6/+12
| | | | | | | | | | | | | | | up during the Fedora submission review, that each source file include some text about the license. One handy Perl script later and each file now has the line Distributed under the terms of the Botan license after the copyright notices. While I was in there modifying every file anyway, I also stripped out the remainder of the block comments (lots of astericks before and after the text); this is stylistic thing I picked up when I was first learning C++ but in retrospect it is not a good style as the structure makes it harder to modify comments (with the result that comments become fewer, shorter and are less likely to be updated, which are not good things).
* Add a comment WRT timing attacks on the AES implementationlloyd2008-11-191-0/+14
|
* Optimize AES decryption in the same manner as the last changes to AES ↵lloyd2008-11-172-41/+44
| | | | encryption.
* Optimize the first round of AES, currently in the encryption direction only.lloyd2008-11-172-37/+47
| | | | | | | | | | | This seems to have a significant impact on overall speed, now measuring on my Core2 Q6600: AES-128: 123.41 MiB/sec AES-192: 108.28 MiB/sec AES-256: 95.72 MiB/sec which is roughly 8-10% faster than before.
* Optimize AES decryption in the same way.lloyd2008-11-171-27/+34
|
* Fix indexing in EK_[4-7]lloyd2008-11-171-4/+4
|
* Move the loads of AES::EK to the top of the loop.lloyd2008-11-171-8/+18
| | | | | | | | | | | | Before: $ ./check --bench-algo=AES-128,AES-256 --seconds=10 AES-128: 101.99 MiB/sec AES-256: 78.30 MiB/sec After: $ ./check --bench-algo=AES-128,AES-256 --seconds=10 AES-128: 106.51 MiB/sec AES-256: 84.26 MiB/sec
* Format block comments for Doxygenlloyd2008-11-172-56/+64
|
* Rename SymmetricAlgorithm::key to key_schedule to avoid many namelloyd2008-11-092-2/+2
| | | | conflicts/collisions
* Split ciphers into block and stream ciphers. Move base class headerslloyd2008-11-084-0/+697