aboutsummaryrefslogtreecommitdiffstats
path: root/src/hash/sha1_sse2/sha1_sse2.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Shuffle things around. Add NIST X.509 test to build.lloyd2014-01-011-335/+0
|
* Fix building with --via-amalgamation; it wouldn't generate thelloyd2011-06-031-1/+10
| | | | | | | | amalgamation properly, but would happen to work if a previously written amalgamation was around. Also make changes allowing using the SIMD optimized versions of SHA-1 and Serpent to be used in the amalgamation.
* Remove most uses of HASH_BLOCK_SIZElloyd2010-10-131-1/+1
|
* Use size_t for BufferedComputation::add_datalloyd2010-10-121-2/+2
|
* Do the prep/unroll phase 4 rounds before it is needed instead of 3;lloyd2010-09-211-97/+92
| | | | tests on Nehalem indicate a small but measurable win there (about 3%).
* Clean up, hide union accesses with a macro to make it easier to testlloyd2010-09-211-40/+92
| | | | alternative methods of getting pieces of the expanded message.
* Remove some C-style castslloyd2010-04-231-1/+1
|
* Un-internal loadstor.h (and its header deps, rotate.h andlloyd2009-12-211-1/+1
| | | | | | | | | | | | | | bswap.h); too many external apps rely on loadstor.h existing. Define 64-bit generic bswap in terms of 32-bit bswap, since it's not much slower if 32-bit is also generic, and much faster if it's not. This may be quite helpful on 32-bit x86 in particular. Change formulation of generic 32-bit bswap. It may be faster or slower depending on the CPU, especially the latency and throuput of rotate instructions, but should be faster on an ideally superscalar processor with rotate instructions (ie, what I expect future CPUs to look more like).
* Make many more headers internal-only.lloyd2009-12-161-1/+1
| | | | | | | | | | | | | Fixes for the amalgamation generator for internal headers. Remove BOTAN_DLL exporting macros from all internal-only headers; the classes/functions there don't need to be exported, and avoiding the PIC/GOT indirection can be a big win. Add missing BOTAN_DLLs where necessary, mostly gfpmath and cvc For GCC, use -fvisibility=hidden and set BOTAN_DLL to the visibility __attribute__ to export those classes/functions.
* Cleanups - remove emails from source files, they should only live inlloyd2009-11-101-2/+2
| | | | credits.txt and thanks.txt. Remove some various bits of formatting weirdness.
* Clean up prep00_15 - same speed on Core2lloyd2009-10-291-16/+10
|
* Clean up the SSE2 SHA-1 code quite a bit, make better use of C++ featureslloyd2009-10-291-6/+267
| | | | and also make it stylistically much closer to the standard SHA-1 code.
* Thomas Moschny passed along a request from the Fedora packagers which camelloyd2009-03-301-7/+9
| | | | | | | | | | | | | | | up during the Fedora submission review, that each source file include some text about the license. One handy Perl script later and each file now has the line Distributed under the terms of the Botan license after the copyright notices. While I was in there modifying every file anyway, I also stripped out the remainder of the block comments (lots of astericks before and after the text); this is stylistic thing I picked up when I was first learning C++ but in retrospect it is not a good style as the structure makes it harder to modify comments (with the result that comments become fewer, shorter and are less likely to be updated, which are not good things).
* Revert change that added multiblock support to SSE2 SHA-1. Was causinglloyd2008-11-231-1/+5
| | | | | a random segfault (always inside an SSE2 intrinsic). Did not investigate much beyond that. Worth looking into since it seemed worth another 1% or so.
* Dean Gaudet's original version of the SHA-1 SSE2 code supported multiplelloyd2008-11-231-5/+1
| | | | | blocks as input (and can overlap computations from one block to another - very nice). Reimport that original version and use it.
* I had not anticipated this being really worthwhile, but it turns outlloyd2008-11-231-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to have been so! Change MDx_HashFunction::hash to a new compress_n which hashes an arbitrary number of blocks. I had a thought this might reduce a bit of loop overhead but the results were far better than I anticipated. Speedup across the board of about 2%, and very noticable (+10%) increases for MD4 and Tiger (probably b/c both of those have so few instructions in each iteration of the compression function). Before: SHA-1: amd64: 211.9 MiB/s core: 210.0 MiB/s sse2: 295.2 MiB/s MD4: 476.2 MiB/s MD5: 355.2 MiB/s SHA-256: 99.8 MiB/s SHA-512: 151.4 MiB/s RIPEMD-128: 326.9 MiB/s RIPEMD-160: 225.1 MiB/s Tiger: 214.8 MiB/s Whirlpool: 38.4 MiB/s After: SHA-1: amd64: 215.6 MiB/s core: 213.8 MiB/s sse2: 299.9 MiB/s MD4: 528.4 MiB/s MD5: 368.8 MiB/s SHA-256: 103.9 MiB/s SHA-512: 156.8 MiB/s RIPEMD-128: 334.8 MiB/s RIPEMD-160: 229.7 MiB/s Tiger: 240.7 MiB/s Whirlpool: 38.6 MiB/s
* Fix prototype confusion (harmless but incorrect)lloyd2008-09-301-3/+1
|
* Derive x86, x86-64, and SSE2 implementations of SHA-1 directly from SHA_160lloyd2008-09-291-24/+0
|
* Make asm implementations distinctly named objects, for instance MD5_IA32,lloyd2008-09-291-0/+44
rather than silently replacing the C++ versions. Instead they are silently replaced (currently, at least) at the lookup level: we switch off the set of feature macros set to choose the best implementation in the current build configuration. So you can have (and benchmark) MD5 and MD5_IA32 directly against each other in the same program with no hassles, but if you ask for "MD5", you'll get maybe an MD5 or maybe MD5_IA32. Also make the canonical asm names (which aren't guarded by C++ namespaces) of the form botan_<algo>_<arch>_<func> as in botan_sha160_ia32_compress, to avoid namespace collisions. This change has another bonus that it should in many cases be possible to derive the asm specializations directly from the original implementation, saving some code (and of course logically SHA_160_IA32 is a SHA_160, just one with a faster implementation of the compression function, so this seems reasonable anyway).