| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
xoring integer values in.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
a Buffered_EntropySource. Data used in the poll is directly accumulated
into the output buffer using XOR, wrapping around as needed. The
implementation uses xor_into_buf from xor_buf.h
This is simpler and more convincingly secure than the method used
by Buffered_EntropySource. In particular the collected data is persisted
in the buffer there much longer than needed. It is also much harder for
entropy sources to signal errors or a failure to collected data using
Buffered_EntropySource. And, with the simple xor_into_buf function, it
is actually quite easy to remove without major changes.
|
|
|
|
| |
randomness data after the contents have been fed into the MAC.
|
| |
|
|
|
|
|
| |
a random segfault (always inside an SSE2 intrinsic). Did not investigate
much beyond that. Worth looking into since it seemed worth another 1% or so.
|
|
|
|
|
| |
blocks as input (and can overlap computations from one block to another -
very nice). Reimport that original version and use it.
|
|
|
|
|
| |
the registers only once and carrying the values over between loop
iterations.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to have been so! Change MDx_HashFunction::hash to a new compress_n
which hashes an arbitrary number of blocks. I had a thought this might
reduce a bit of loop overhead but the results were far better than I
anticipated. Speedup across the board of about 2%, and very
noticable (+10%) increases for MD4 and Tiger (probably b/c both
of those have so few instructions in each iteration of the
compression function).
Before:
SHA-1:
amd64: 211.9 MiB/s
core: 210.0 MiB/s
sse2: 295.2 MiB/s
MD4: 476.2 MiB/s
MD5: 355.2 MiB/s
SHA-256: 99.8 MiB/s
SHA-512: 151.4 MiB/s
RIPEMD-128: 326.9 MiB/s
RIPEMD-160: 225.1 MiB/s
Tiger: 214.8 MiB/s
Whirlpool: 38.4 MiB/s
After:
SHA-1:
amd64: 215.6 MiB/s
core: 213.8 MiB/s
sse2: 299.9 MiB/s
MD4: 528.4 MiB/s
MD5: 368.8 MiB/s
SHA-256: 103.9 MiB/s
SHA-512: 156.8 MiB/s
RIPEMD-128: 334.8 MiB/s
RIPEMD-160: 229.7 MiB/s
Tiger: 240.7 MiB/s
Whirlpool: 38.6 MiB/s
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
was too slow, it was noticably slowing down AutoSeeded_RNG. Reduce the
amount of output gathered to 32 times the size of the output buffer,
and instead of using Buffered_EntropySource, just xor the read file
data directly into the output buffer. Read up to 4096 bytes per file, but
only count the first 128 towards the total goal (/proc/config.gz being
a major culprit - large, random looking, and entirely or almost static).
|
| |
|
| |
|
|
|
|
| |
to 64 bit values before doing multiplication.
|
| |
|
| |
|
| |
|
|
|
|
|
| |
which uses the ANSI/ISO clock function (previously this had been the
Timer::clock default implementation).
|
|
|
|
|
| |
timer alternatives. I realized otherwise each application would be forced
to do the exact same thing, and no reason for that.
|
| |
|
|
|
|
| |
to support multiple blocks.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
both support TR1 fine AFAICT.
Add ability to explicitly disable using TR1 with --with-tr1=none
Add a marker in the cc info files specifiying if TR1 should be chosen
by default. Yes, autoconf would be better for this than a static
per-compiler setting. Yes, I totally hate autoconf. Yes, I would still
consider autoconf patches. No, I'm not going to do it myself. :)
I am looking forward to being able to safely adopt C++0x and TR2
throughout the library and make the need for a lot of this special-casing
stuff go away.
Until then, it seems better to defaulting to using tr1 (and thus, ECC) than
not.
|
|
|
|
|
|
|
| |
when the test failed. I had added them for debugging something long ago.
What I thought was an InSiTo ECC test failure was actually a sucessful test,
it was making sure an Illegal_Point would be thrown in the conditions tested.
So, all seems OK.
|
| |
|
|
|
|
| |
encryption.
|
|
|
|
|
|
|
|
|
|
|
| |
This seems to have a significant impact on overall speed, now measuring
on my Core2 Q6600:
AES-128: 123.41 MiB/sec
AES-192: 108.28 MiB/sec
AES-256: 95.72 MiB/sec
which is roughly 8-10% faster than before.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before:
$ ./check --bench-algo=AES-128,AES-256 --seconds=10
AES-128: 101.99 MiB/sec
AES-256: 78.30 MiB/sec
After:
$ ./check --bench-algo=AES-128,AES-256 --seconds=10
AES-128: 106.51 MiB/sec
AES-256: 84.26 MiB/sec
|
| |
|
| |
|
| |
|
|
|
|
|
| |
so that algo_cache.h does not have to be visible in the source of all
callers who include libstate.h/algo_factory.h
|
| |
|
|
|
|
| |
template to share the search and cache management logic across all types
|
|
|
|
|
|
|
|
| |
replacing with an updated constructor to Algorithm_Factory taking a
vector of Engine*. The semantics of adding engines at runtime were not
defined nor very clear, it seems best to prohibit this unless and
until it is explicitly thought through (and until a need for it
presents itself).
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
algo_factory and algorithm_factory. This is confusing
so for consistency/simplicity, remove algo_factory, making
algorithm_factory the function to call.
In 1.7.14, several functions in lookup.h, including
retrieve_block_cipher, retrieve_hash, etc were changed to accept
a Library_State& reference. However it turns out with the
modified design I've settled upon for 1.8 that it is not
necessary to change those interfaces; instead they always refer
to the global_state algorithm factory which is exactly the
semantics one would expect/desire 99% of the time (and is source
compatible with code written for 1.6, also a plus)
|
| |
|
|
|
|
|
|
|
|
| |
Engine_Iterator (and thus the public key engine code) still processes
in order of first engine to last in the list.
Benchmarking confirmed that GNU MP is still faster than both OpenSSL
and Botan for public key operations (at least on my machine).
|
|
|
|
|
| |
integers where we manipulate values denominated in nanoseconds to avoid
overflow (2^64 nanoseconds = 584.55 years, aka long enough)
|
|
|
|
|
|
|
|
|
| |
instantiated directly, which seemd a useful thing since a Timer ref
must be passed to the benchmark system. However I think due to the
low resolution of clock() the results were highly variable. Using
gettimeofday or clock_gettime (Unix_Timer, POSIX_Timer) proved more
consistent and in sync with what the existing benchmarks show, so
use one of those if possible.
|
|
|
|
|
|
|
|
|
|
|
|
| |
was not the right place to keep track of this information. Also modify
all Algorithm_Factory constructor functions to take instead of a SCAN_Name
a pair of std::strings - the SCAN name and an optional provider name. If
a provider is specified, either that provider will be used or the request
will fail. Otherwise, the library will attempt best effort, based on
user-set algorithm implementation settings (combine with benchmark.h for
choosing the fastest implementation at runtime) or if not set, a static
ordering (preset in static_provider_weight in prov_weight.cpp, though it
would be nice to make this easier to toggle).
|
| |
|
| |
|
| |
|