aboutsummaryrefslogtreecommitdiffstats
path: root/src/block
Commit message (Collapse)AuthorAgeFilesLines
* Set parallelism defaults.lloyd2010-02-256-1/+15
| | | | | | | | Default unless specified is now 4. For SIMD code, use 2x the number of blocks which are processed in parallel using SIMD by that cipher. It may make sense to increase this to 4x or even more, further experimentation is necessary.
* Instead of the mode parallelism being specified via macros, have itlloyd2010-02-251-0/+5
| | | | | | | | | depend on the particular implementation. Add a new virtual function to BlockCipher named parallelism that returns the number of blocks the cipher object could or might want to process in parallel. Currently set to 1 by default but may make sense to increase this for even scalar implementations since it seems like better caching behavior makes it a win.
* Add SIMD version of Noekeon. On a Core2, about 2.7x faster using SIMD_SSE2lloyd2010-01-124-1/+198
| | | | and 1.6x faster using SIMD_Scalar.
* Kill unneeded includelloyd2010-01-121-1/+0
|
* Add block cipher cascadelloyd2010-01-113-0/+148
|
* Clean up exceptions. Remove some unused ones like Config_Error. Makelloyd2010-01-051-1/+2
| | | | | | | Invalid_Argument just a typedef for std::invalid_argument. Make Botan::Exception a typedef for std::runtime_error. Make Memory_Exhaustion a public exception, and use it in other places where memory allocations can fail.
* Add doxygen commentslloyd2009-12-291-0/+13
|
* Add last nights project, an SSE2 implementation of IDEA. Right about 4xlloyd2009-12-235-53/+290
| | | | faster than the scalar version on a Core2.
* Un-internal loadstor.h (and its header deps, rotate.h andlloyd2009-12-2126-38/+41
| | | | | | | | | | | | | | bswap.h); too many external apps rely on loadstor.h existing. Define 64-bit generic bswap in terms of 32-bit bswap, since it's not much slower if 32-bit is also generic, and much faster if it's not. This may be quite helpful on 32-bit x86 in particular. Change formulation of generic 32-bit bswap. It may be faster or slower depending on the CPU, especially the latency and throuput of rotate instructions, but should be faster on an ideally superscalar processor with rotate instructions (ie, what I expect future CPUs to look more like).
* Add missing BOTAN_DLL exports.lloyd2009-12-161-3/+1
| | | | Move most of the engine headers to internal
* Make many more headers internal-only.lloyd2009-12-1629-41/+41
| | | | | | | | | | | | | Fixes for the amalgamation generator for internal headers. Remove BOTAN_DLL exporting macros from all internal-only headers; the classes/functions there don't need to be exported, and avoiding the PIC/GOT indirection can be a big win. Add missing BOTAN_DLLs where necessary, mostly gfpmath and cvc For GCC, use -fvisibility=hidden and set BOTAN_DLL to the visibility __attribute__ to export those classes/functions.
* Full working amalgamation build, plus internal-only headers concept.lloyd2009-12-166-11/+24
|
* Add missing header guards to package.h and botan.hlloyd2009-12-021-2/+2
| | | | | Change serp_simd_sbox.h's header guard to use the leading BOTAN_ prefix for proper macro namespacing.
* Remove obsolete commentlloyd2009-11-171-15/+0
|
* Rename/remove some secmem member variables for better matching with STLlloyd2009-11-172-3/+3
| | | | | | | | containers (specifically vector). Rename is_empty to empty Remove has_items Rename create to resize
* Instead of having two asm_macr.h files being switched in based on modulelloyd2009-11-141-1/+1
| | | | build magic, name them asm_macr_ARCH.h. Change all including files accordingly.
* Cleanups in the Square implementationlloyd2009-11-111-30/+38
|
* Double the speed of Skipjack on my Core2, mostly due to better inlining.lloyd2009-11-112-82/+99
|
* Inline all of the AES tables into an anon namespace in aes.cpp. Turns outlloyd2009-11-113-411/+399
| | | | to give a 3-7% speed improvement on Core2 with GCC.
* Almost double the speed of MARS; from 55 MiB/s to 102 on my Core2. lloyd2009-11-113-231/+216
|
* Remove SSE4 dependency in AES-192 key schedule, and also avoid requiringlloyd2009-11-102-26/+25
| | | | an extra 4 words at the end of EK for writing (unused) values.
* Add AES-192 using AES-NI. Tested OK with Intel's simulator.lloyd2009-11-102-7/+276
| | | | | | | Currently requires SSE4.1 for _mm_extract_epi32 for the key schedule, it would be nice to remove this dependency, though all currently known/scheduled chips with AES-NI (Intel Westmere and Sandy Bridge, and AMD Bulldozer) are supposed to include SSE 4.1 so this is not a huge problem.
* Add unrolled versions of AES-NI code that will handle 4 blocks in parallel.lloyd2009-11-101-12/+176
| | | | | No noticable change under the simulator (no surprises there), but should help a lot with pipelining on real hardware.
* Fix errors in the AES-256 key schedule for the AES-NI version. Now passeslloyd2009-11-102-196/+169
| | | | | | | | | tests under Intel's emulator. Document and enable in the engine. Merge both versions to aes_intel.cpp - some shared code and much similiar structure which might be sharable via macros.
* Add AES-256 using AES-NIlloyd2009-11-103-3/+243
|
* Make the AES implementation using Intel's AES instruction extension official;lloyd2009-11-102-7/+7
| | | | testing with Intel's emulator shows all green.
* Cleanups - remove emails from source files, they should only live inlloyd2009-11-101-1/+1
| | | | credits.txt and thanks.txt. Remove some various bits of formatting weirdness.
* Clean up aes_128_key_expansionlloyd2009-11-061-24/+18
|
* Dename unused length fieldlloyd2009-11-061-1/+1
|
* Add a new need_isa marker for info.txt that lets a module dependlloyd2009-11-061-1/+1
| | | | | | | | | | | | on a particular ISA extension rather than a list of CPUs. Much easier to edit and audit, too. Add markers on the AES-NI code and SHA-1/SSE2. Serpent and XTEA don't need it because they are generic and only depend on simd_32 which will silenty swap out a scalar version if SSE2/AltiVec isn't enabled (since it turns out on supersclar processors just doing 4 blocks in parallel can be a win even in GPRs). Add pentium3 to the list of CPUs with rdtsc, was missing. Odd!
* Add a complete but untested AES-128 using the AES-NI intrinsics.lloyd2009-11-062-58/+139
| | | | | | | | | | | | | | | | | | From looking at how key gen works in particular, it seems easiest to provide only AES-128, AES-192, and AES-256 and not a general AES class that can accept any key length. This also has the bonus of allowing full loop unrolling which may be a win (how much so will depend on the latency/throughput of the AES instructions which is currently unknown). No block interleaving, though of course it works very nicely here, simply due to the desire to keep things simple until what is currently here can actually be tested. (Intel has an emulator that is supposed to work but just crashes on my machine...) I'm not entirely sure if byte swapping is required. Intel has a white paper out that suggests it isn't (and really it would have been stupid of them to not build this into the aes instructions), but who knows. If it turns out to be necessary there is a pretty fast bswap instruction for SSE anyway.
* Stub for AES class using Intel's AES-NI instructions and an engine forlloyd2009-11-063-0/+145
| | | | | providing it. Also stubs in the engine for VIA's AES instructions, but needs CPUID checking also.
* Indent and avoid one extra assignmentlloyd2009-11-041-3/+2
|
* Kill realnames on new modules not in mailinelloyd2009-10-292-3/+0
|
* propagate from branch 'net.randombit.botan' (head ↵lloyd2009-10-2910-540/+668
|\ | | | | | | | | | | 8fb69dd1c599ada1008c4cab2a6d502cbcc468e0) to branch 'net.randombit.botan.general-simd' (head c05c9a6d398659891fb8cca170ed514ea7e6476d)
| * Rename SSE2 stuff to be generally SIMD since it supports at least SSE2lloyd2009-10-299-43/+45
| | | | | | | | and Altivec (though Altivec is seemingly slower ATM...)
| * Add XTEA decryptionlloyd2009-10-261-11/+47
| |
| * Add a wrapper for a set of SSE2 operations with convenient syntax for 4x32lloyd2009-10-266-404/+493
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | operations. Also add a pure scalar code version. Convert Serpent to use this new interface, and add an implementation of XTEA in SIMD. The wrappers plus the scalar version allow SIMD-ish code to work on all platforms. This is often a win due to better ILP being visible to the processor (as with the recent XTEA optimizations). Only real danger is register starvation, mostly an issue on x86 these days. So it may (or may not) be a win to consolidate the standard C++ versions and the SIMD versions together. Future work: - Add AltiVec/VMX version - Maybe also for ARM's NEON extension? Less pressing, I would think. - Convert SHA-1 code to use SIMD_32 - Add XTEA SIMD decryption (currently only encrypt) - Change SSE2 engine to SIMD_engine - Modify configure.py to set BOTAN_TARGET_CPU_HAS_[SSE2|ALTIVEC|NEON|XXX] macros
* | Remove the 'realname' attribute on all modules and cc/cpu/os info files.lloyd2009-10-2926-52/+0
|/ | | | | Pretty much useless and unused, except for listing the module names in build.h and the short versions totally suffice for that.
* Kill stdio includelloyd2009-10-231-2/+0
|
* Use new load/store ops in xtea x4 codelloyd2009-10-231-12/+6
|
* Simply unrolling the loop in XTEA and processing 4 blocks worth of data atlloyd2009-10-231-0/+70
| | | | | | | | a time more than doubles performance (from 38 MB/s to 90 MB/s on Core2 Q6600). Could do even better with SIMD, I'm sure, but this is fast and easy, and works everywhere. Probably will hurt on 32-bit x86 from the register pressure.
* Remove all exception specifications. The way these are designed in C++ islloyd2009-10-2234-35/+35
| | | | | | just too fragile and not that useful. Something like Java's checked exceptions might be nice, but simply killing the process entirely if an unexpected exception is thrown is not exactly useful for something trying to be robust.
* Cleanups/random changes in the stream cipher code:lloyd2009-10-141-4/+4
| | | | | | | | | | | | | Remove encrypt, decrypt - replace by cipher() and cipher1() Remove seek() - not well supported/tested, I want to redo with a new interface once CTR and OFB modes become stream ciphers. Rename resync to set_iv() Remove StreamCipher::IV_LENGTH and add StreamCipher::valid_iv_length() to allow multiple IV lengths (as for instance Turing allows, as would Salsa20 if XSalsa20 were supported).
* Disable prefetch in AES for now. Problem: with iterative modes like CBC,lloyd2009-09-301-8/+0
| | | | | | | | the prefetch is called for each block of input, and so a total of (4096+256)/64 = 68 prefetches are executed for each block. This reduces performance of iterative modes dramatically. I'm not sure what the right approach for dealing with this is.
* Use prefetching in AES. Nominally, this will help somewhat with preventinglloyd2009-09-291-0/+8
| | | | | | | | | | timing attacks, since once all the TE/SE tables are entirely in cache then timing attacks against it become somewhat harder. However for this to be a full defense it would be necessary to ensure the tables were entirely loaded into cache, which is not guaranteed by the normal SSE prefetch instructions. (Or prefetch instructions for other CPUs, AFAIK). Much more importantly, it provides a 10% speedup.
* Remove add block from block/info.txtlloyd2009-09-291-6/+0
|
* Remove add blocks from block cipher info fileslloyd2009-09-2925-188/+0
|
* Use load_le instead of make_u32bit in Serpent x86 key schedule codelloyd2009-09-291-1/+1
|
* Indentation fixlloyd2009-09-211-13/+12
|