aboutsummaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* Use template version of xor_into_buf in es_unixlloyd2008-11-231-1/+1
|
* Move xor_into_buf to xor_buf.cpp. Also add a new template wrapper forlloyd2008-11-233-10/+39
| | | | xoring integer values in.
* Change unix_procs entropy source to be a plain EntropySource instead oflloyd2008-11-233-24/+47
| | | | | | | | | | | | | a Buffered_EntropySource. Data used in the poll is directly accumulated into the output buffer using XOR, wrapping around as needed. The implementation uses xor_into_buf from xor_buf.h This is simpler and more convincingly secure than the method used by Buffered_EntropySource. In particular the collected data is persisted in the buffer there much longer than needed. It is also much harder for entropy sources to signal errors or a failure to collected data using Buffered_EntropySource. And, with the simple xor_into_buf function, it is actually quite easy to remove without major changes.
* In Randpool and HMAC_RNG, zeroize the I/O buffer used for holding polledlloyd2008-11-232-0/+5
| | | | randomness data after the contents have been fed into the MAC.
* Add xor_into_buf. Add Doxygen comments for xor_buflloyd2008-11-231-10/+32
|
* Revert change that added multiblock support to SSE2 SHA-1. Was causinglloyd2008-11-233-206/+183
| | | | | a random segfault (always inside an SSE2 intrinsic). Did not investigate much beyond that. Worth looking into since it seemed worth another 1% or so.
* Dean Gaudet's original version of the SHA-1 SSE2 code supported multiplelloyd2008-11-233-183/+206
| | | | | blocks as input (and can overlap computations from one block to another - very nice). Reimport that original version and use it.
* Do a minor optimization in some of the compression functions, loadinglloyd2008-11-237-121/+145
| | | | | the registers only once and carrying the values over between loop iterations.
* Update SHA1_IA32 to use compress_nlloyd2008-11-231-2/+6
|
* I had not anticipated this being really worthwhile, but it turns outlloyd2008-11-2333-746/+876
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to have been so! Change MDx_HashFunction::hash to a new compress_n which hashes an arbitrary number of blocks. I had a thought this might reduce a bit of loop overhead but the results were far better than I anticipated. Speedup across the board of about 2%, and very noticable (+10%) increases for MD4 and Tiger (probably b/c both of those have so few instructions in each iteration of the compression function). Before: SHA-1: amd64: 211.9 MiB/s core: 210.0 MiB/s sse2: 295.2 MiB/s MD4: 476.2 MiB/s MD5: 355.2 MiB/s SHA-256: 99.8 MiB/s SHA-512: 151.4 MiB/s RIPEMD-128: 326.9 MiB/s RIPEMD-160: 225.1 MiB/s Tiger: 214.8 MiB/s Whirlpool: 38.4 MiB/s After: SHA-1: amd64: 215.6 MiB/s core: 213.8 MiB/s sse2: 299.9 MiB/s MD4: 528.4 MiB/s MD5: 368.8 MiB/s SHA-256: 103.9 MiB/s SHA-512: 156.8 MiB/s RIPEMD-128: 334.8 MiB/s RIPEMD-160: 229.7 MiB/s Tiger: 240.7 MiB/s Whirlpool: 38.6 MiB/s
* Fix integer overflow in benchmarslloyd2008-11-231-4/+4
|
* Remove dep on buf_es in proc_walk info.txtlloyd2008-11-211-4/+0
|
* Fix poorly named functionlloyd2008-11-211-6/+6
|
* Last minute es_ftw optimizations / logic changes. Performance of seedinglloyd2008-11-212-35/+27
| | | | | | | | | was too slow, it was noticably slowing down AutoSeeded_RNG. Reduce the amount of output gathered to 32 times the size of the output buffer, and instead of using Buffered_EntropySource, just xor the read file data directly into the output buffer. Read up to 4096 bytes per file, but only count the first 128 towards the total goal (/proc/config.gz being a major culprit - large, random looking, and entirely or almost static).
* Remove debug printflloyd2008-11-211-1/+0
|
* Cache socket descriptors in EGD entropy source, instead of creating each polllloyd2008-11-212-50/+97
|
* Avoid a potential 32-bit overflow in Timer::combine_timers by promotinglloyd2008-11-211-2/+4
| | | | to 64 bit values before doing multiplication.
* Mention ANSI clock seems pretty bogus for benchmarkinglloyd2008-11-211-0/+3
|
* Add comment showing likely future API for multi-block encryption in BlockCipherlloyd2008-11-211-0/+6
|
* Move MISTY1 tables from mist_tab.cpp to misty1.cpp - pretty smalllloyd2008-11-214-118/+106
|
* Make Timer a pure virtual interface and add a new subclass ANSI_Clock_Timerlloyd2008-11-212-31/+40
| | | | | which uses the ANSI/ISO clock function (previously this had been the Timer::clock default implementation).
* Add a typedef in benchmark.h Default_Benchmark_Timer, which checks availablelloyd2008-11-211-8/+26
| | | | | timer alternatives. I realized otherwise each application would be forced to do the exact same thing, and no reason for that.
* Add a comment WRT timing attacks on the AES implementationlloyd2008-11-191-0/+14
|
* Add a comment to BlockCipher mentionining the usefulness of extending itlloyd2008-11-181-0/+9
| | | | to support multiple blocks.
* Add some Doxygen comments for BlockCipherModePaddingMethodlloyd2008-11-181-23/+62
|
* Use TR1 by default with GNU C++ and Intel C++, since all recent versions oflloyd2008-11-172-0/+4
| | | | | | | | | | | | | | | | | | both support TR1 fine AFAICT. Add ability to explicitly disable using TR1 with --with-tr1=none Add a marker in the cc info files specifiying if TR1 should be chosen by default. Yes, autoconf would be better for this than a static per-compiler setting. Yes, I totally hate autoconf. Yes, I would still consider autoconf patches. No, I'm not going to do it myself. :) I am looking forward to being able to safely adopt C++0x and TR2 throughout the library and make the need for a lot of this special-casing stuff go away. Until then, it seems better to defaulting to using tr1 (and thus, ECC) than not.
* Remove print statements in PointGFp::check_invariants which were triggeredlloyd2008-11-171-19/+0
| | | | | | | when the test failed. I had added them for debugging something long ago. What I thought was an InSiTo ECC test failure was actually a sucessful test, it was making sure an Illegal_Point would be thrown in the conditions tested. So, all seems OK.
* Enable SSE2 SHA-1 on Intel Prescott CPUslloyd2008-11-171-0/+1
|
* Optimize AES decryption in the same manner as the last changes to AES ↵lloyd2008-11-172-41/+44
| | | | encryption.
* Optimize the first round of AES, currently in the encryption direction only.lloyd2008-11-172-37/+47
| | | | | | | | | | | This seems to have a significant impact on overall speed, now measuring on my Core2 Q6600: AES-128: 123.41 MiB/sec AES-192: 108.28 MiB/sec AES-256: 95.72 MiB/sec which is roughly 8-10% faster than before.
* Optimize AES decryption in the same way.lloyd2008-11-171-27/+34
|
* Fix indexing in EK_[4-7]lloyd2008-11-171-4/+4
|
* Move the loads of AES::EK to the top of the loop.lloyd2008-11-171-8/+18
| | | | | | | | | | | | Before: $ ./check --bench-algo=AES-128,AES-256 --seconds=10 AES-128: 101.99 MiB/sec AES-256: 78.30 MiB/sec After: $ ./check --bench-algo=AES-128,AES-256 --seconds=10 AES-128: 106.51 MiB/sec AES-256: 84.26 MiB/sec
* Format block comments for Doxygenlloyd2008-11-172-56/+64
|
* Don't link against explict version in botan-config (breaks with static libs)lloyd2008-11-131-1/+1
|
* Make installation a little noisierlloyd2008-11-132-4/+4
|
* In Algorithm_Factory, create the Algorithm_Cache<> objects dynamicallylloyd2008-11-122-30/+40
| | | | | so that algo_cache.h does not have to be visible in the source of all callers who include libstate.h/algo_factory.h
* Add comment about non-obvious but vital side effectlloyd2008-11-121-0/+5
|
* Implement the guts of Algorithm_Factory::prototoype_X using a functionlloyd2008-11-121-65/+80
| | | | template to share the search and cache management logic across all types
* Remove Library_State::add_engine and Algorithm_Factory::add_engine,lloyd2008-11-123-35/+37
| | | | | | | | replacing with an updated constructor to Algorithm_Factory taking a vector of Engine*. The semantics of adding engines at runtime were not defined nor very clear, it seems best to prohibit this unless and until it is explicitly thought through (and until a need for it presents itself).
* Add missing prov_weight.cpplloyd2008-11-121-0/+31
|
* Library_State had two functions that did the same thing,lloyd2008-11-127-65/+69
| | | | | | | | | | | | | | | algo_factory and algorithm_factory. This is confusing so for consistency/simplicity, remove algo_factory, making algorithm_factory the function to call. In 1.7.14, several functions in lookup.h, including retrieve_block_cipher, retrieve_hash, etc were changed to accept a Library_State& reference. However it turns out with the modified design I've settled upon for 1.8 that it is not necessary to change those interfaces; instead they always refer to the global_state algorithm factory which is exactly the semantics one would expect/desire 99% of the time (and is source compatible with code written for 1.6, also a plus)
* Fix memory leaks in PBE_PKCS5v20 and get_pbelloyd2008-11-123-4/+14
|
* Revert 2707eb68cb91e0633815a6d6c68d22b9f41227a4 - I had forgotten thatlloyd2008-11-121-1/+1
| | | | | | | | Engine_Iterator (and thus the public key engine code) still processes in order of first engine to last in the list. Benchmarking confirmed that GNU MP is still faster than both OpenSSL and Botan for public key operations (at least on my machine).
* Oops, 2^32 nanoseconds < 4.3 seconds, which is pretty small. Use 64 bitlloyd2008-11-121-7/+7
| | | | | integers where we manipulate values denominated in nanoseconds to avoid overflow (2^64 nanoseconds = 584.55 years, aka long enough)
* Add an implementation of name() to Timer. This allows it to belloyd2008-11-121-0/+2
| | | | | | | | | instantiated directly, which seemd a useful thing since a Timer ref must be passed to the benchmark system. However I think due to the low resolution of clock() the results were highly variable. Using gettimeofday or clock_gettime (Unix_Timer, POSIX_Timer) proved more consistent and in sync with what the existing benchmarks show, so use one of those if possible.
* Remove support for provider identifiers from SCAN_Name - it turns out thislloyd2008-11-1212-211/+270
| | | | | | | | | | | | was not the right place to keep track of this information. Also modify all Algorithm_Factory constructor functions to take instead of a SCAN_Name a pair of std::strings - the SCAN name and an optional provider name. If a provider is specified, either that provider will be used or the request will fail. Otherwise, the library will attempt best effort, based on user-set algorithm implementation settings (combine with benchmark.h for choosing the fastest implementation at runtime) or if not set, a static ordering (preset in static_provider_weight in prov_weight.cpp, though it would be nice to make this easier to toggle).
* Remove last uses of lookup.h in CMS codelloyd2008-11-112-18/+31
|
* Remove some uses of lookup.h from CMS codelloyd2008-11-112-4/+9
|
* Mention an idea for wall clock sync in Hardware_Timerlloyd2008-11-111-0/+6
|