| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
| |
change some of the hash functions to use it as low hanging fruit.
Probably could use further optimization (just unrolls x4 currently), but
merely having it as syntax is good as it allows optimizing many functions
at once (eg using SSE2 to do 4-way byteswaps).
|
|
|
|
|
| |
Document SHA optimizations, AltiVec runtime checking, fixes for cpuid
for both icc and msvc.
|
|\
| |
| |
| |
| |
| | |
4fd7eb9630271d3c1dfed21987ef864680d4ce7b)
to branch 'net.randombit.botan.general-simd' (head 91df868149cdc4754d340e6103028acc82182609)
|
| | |
|
| |
| |
| |
| | |
and also make it stylistically much closer to the standard SHA-1 code.
|
| | |
|
| |
| |
| |
| | |
the code stylistically, etc)
|
| | |
|
| |
| |
| |
| |
| | |
returns true if they might plausibly work. AltiVec and SSE2 versions call
into CPUID, scalar version always works.
|
| | |
|
| |
| |
| |
| |
| | |
Relies on mfspr emulation/trapping by the kernel, which works on (at least)
Linux and NetBSD.
|
| | |
|
| |
| |
| |
| |
| |
| | |
for unaligned writes is messy as hell.
If writes are batched this is somewhat easier to deal with (somewhat).
|
| | |
|
| |\
| | |
| | |
| | |
| | |
| | | |
54d2cc7b00ecd5f41295e147d23ab6d294309f61)
to branch 'net.randombit.botan.general-simd' (head 9cb1b5f00bfefd05cd9555489db34e6d86867aca)
|
| | |\
| | | |
| | | |
| | | |
| | | |
| | | | |
8fb69dd1c599ada1008c4cab2a6d502cbcc468e0)
to branch 'net.randombit.botan.general-simd' (head c05c9a6d398659891fb8cca170ed514ea7e6476d)
|
| | | |
| | | |
| | | |
| | | | |
and Altivec (though Altivec is seemingly slower ATM...)
|
| | | | |
|
| | | |\
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
bf629b13dd132b263e76a72b7eca0f7e4ab19aac)
to branch 'net.randombit.botan.general-simd' (head f731cff08ff0d04c062742c0c6cfcc18856400ea)
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
on a PowerPC 970 running Gentoo with GCC 4.3.4
Uses a GCC syntax for creating literal values instead of the Motorola
syntax [{1,2,3,4} instead of (1,2,3,4)].
In tests so far, this is much, much slower than either the standard scalar code,
or using the SIMD-in-scalar-registers code. It looks like for whatever reason
GCC is refusing to inline the function:
SIMD_Altivec(__vector unsigned int input) { reg = input; }
and calls it with a branch hundreds of times in each function. I don't know
if this is the entire reason it's slower, but it definitely can't be helping.
The code handles unaligned loads OK but assumes stores are to an aligned address.
This will fail drastically some day, and needs to be fixed to either use scalar
stores, which (most?) PPCs will handle (if slowly), or batch the loads and
stores so we can work across the loads. Considering the code so far loads 4
vectors of data in one go this would probably be a big win (and also for loads,
since instead of doing 8 loads for 4 registers only 5 are needed).
|
| | | | |
| | | | |
| | | | |
| | | | | |
of load_le + bswap
|
| | | | | |
|
| | | | | |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
operations.
Also add a pure scalar code version.
Convert Serpent to use this new interface, and add an implementation of
XTEA in SIMD.
The wrappers plus the scalar version allow SIMD-ish code to work on all
platforms. This is often a win due to better ILP being visible to the
processor (as with the recent XTEA optimizations). Only real danger is
register starvation, mostly an issue on x86 these days. So it may (or may
not) be a win to consolidate the standard C++ versions and the SIMD versions
together.
Future work:
- Add AltiVec/VMX version
- Maybe also for ARM's NEON extension? Less pressing, I would think.
- Convert SHA-1 code to use SIMD_32
- Add XTEA SIMD decryption (currently only encrypt)
- Change SSE2 engine to SIMD_engine
- Modify configure.py to set BOTAN_TARGET_CPU_HAS_[SSE2|ALTIVEC|NEON|XXX] macros
|
| | | | | |
|
|/ / / /
| | | |
| | | |
| | | | |
SHA-256 gets ~7% faster, SHA-512 ~10%.
|
|/ / / |
|
| | | |
|
| | |
| | |
| | |
| | |
| | | |
Pretty much useless and unused, except for listing the module names in
build.h and the short versions totally suffice for that.
|
|\| |
| | |
| | |
| | |
| | |
| | | |
3158f8272a3582dd44dfb771665eb71f7d005339)
to branch 'net.randombit.botan' (head bf629b13dd132b263e76a72b7eca0f7e4ab19aac)
|
| |/ |
|
| | |
|
| |
| |
| |
| |
| |
| |
| | |
since it passes signed ints for whatever reason.
Ensure CALL_CPUID is always defined (previously, it would not be if on an x86
but compiled with something other than GCC, ICC, VC++).
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| | |
Add new load options that are passed a number of variables by reference,
setting them all at once. Will allow for batching operations (eg using
SIMD operations to do 128-bit wide bswaps) for future optimizations.
|
| |
| |
| |
| |
| |
| |
| |
| | |
a time more than doubles performance (from 38 MB/s to 90 MB/s on Core2 Q6600).
Could do even better with SIMD, I'm sure, but this is fast and easy, and
works everywhere.
Probably will hurt on 32-bit x86 from the register pressure.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
a named constant instead of being magic. Move from 64 bytes to 256.
This was necessary to allow
Pipe(new Hex_Decoder, filter, ...)
to give filter a sufficiently large input block.
It would be nicer if the filter itself (in this case, ECB_Decryption, but
others apply as well) was smart enough to buffer on its own.
It might also be useful if code could query what parallelism a block cipher
provided and modify their actions accordingly.
|
| |
| |
| |
| |
| |
| | |
just too fragile and not that useful. Something like Java's checked exceptions
might be nice, but simply killing the process entirely if an unexpected
exception is thrown is not exactly useful for something trying to be robust.
|
| | |
|
| | |
|
| |
| |
| |
| | |
seem to work with C++ at all so untested.
|
| | |
|
| | |
|
| |\
| | |
| | |
| | | |
and '8cc9c08544c0f1f1dba7c7a8da51d1657b1c7df8'
|
| | | |
|
| | |
| | |
| | |
| | |
| | |
| | | |
StreamCipher_Filter
to pass it directly to a Pipe now.
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Remove encrypt, decrypt - replace by cipher() and cipher1()
Remove seek() - not well supported/tested, I want to redo with a new interface
once CTR and OFB modes become stream ciphers.
Rename resync to set_iv()
Remove StreamCipher::IV_LENGTH and add StreamCipher::valid_iv_length() to
allow multiple IV lengths (as for instance Turing allows, as would Salsa20
if XSalsa20 were supported).
|
| |/
| |
| |
| | |
and was tickling a bug in the asm versions because of the constant 0.
|