| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
testing with Intel's emulator shows all green.
|
|
|
|
|
|
| |
AES-256 blocks, plus a handful remaining in a general AES block.
This is necessary for any implementation which only supports a particular
key size, since otherwise no tests at all will run on that implementation.
|
| |
|
|
|
|
|
|
|
| |
virtual-ness not needed, and was overriding/overloading by argument which
doesn't actually work in C++ and only happened to work because it was only
ever used with the version implemented in that same class. ICC was warning,
too. Make non-virtual.
|
|
|
|
| |
credits.txt and thanks.txt. Remove some various bits of formatting weirdness.
|
|
|
|
|
|
| |
included elsewhere and my preference is for the only emails to be in
credits.txt since emails change more often than names and I'd prefer them
not to be constantly either wrong or needing updates.
|
|
|
|
|
|
|
| |
the user to specify the hash function to use, instead of always using SHA-1.
This was a sensible default a few years ago, when there wasn't a ~2^60 attack
on SHA-1 and support for SHA-2 was pretty much nil, but using something else
makes a lot more sense these days.
|
| |
|
| |
|
|
|
|
|
| |
the AES and PCLMUL instructions. Oddness. For the time being, compile
Nehalem and Westmere as Core2 + extras, probably close enough.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
on a particular ISA extension rather than a list of CPUs. Much
easier to edit and audit, too. Add markers on the AES-NI code and
SHA-1/SSE2. Serpent and XTEA don't need it because they are
generic and only depend on simd_32 which will silenty swap out a
scalar version if SSE2/AltiVec isn't enabled (since it turns out
on supersclar processors just doing 4 blocks in parallel can be a
win even in GPRs).
Add pentium3 to the list of CPUs with rdtsc, was missing. Odd!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From looking at how key gen works in particular, it seems easiest to provide
only AES-128, AES-192, and AES-256 and not a general AES class that can
accept any key length. This also has the bonus of allowing full loop unrolling
which may be a win (how much so will depend on the latency/throughput of
the AES instructions which is currently unknown).
No block interleaving, though of course it works very nicely here, simply
due to the desire to keep things simple until what is currently here can
actually be tested. (Intel has an emulator that is supposed to work but
just crashes on my machine...)
I'm not entirely sure if byte swapping is required. Intel has a white paper
out that suggests it isn't (and really it would have been stupid of them to
not build this into the aes instructions), but who knows. If it turns
out to be necessary there is a pretty fast bswap instruction for SSE anyway.
|
|
|
|
|
| |
providing it. Also stubs in the engine for VIA's AES instructions, but
needs CPUID checking also.
|
|
|
|
|
|
| |
ignores this unless it can detect (or is asked to use) a specific model;
otherwise it compiles for the baseline ISA. Remove the default_submodel
entries in the arch files.
|
|
|
|
|
|
|
|
| |
ISA extensions (say, Intel's AES-NI, for instance) so change everything
to reflect that.
Also rename some of the amd64 models, and add entries for k10, nehalem,
and westmere processors.
|
|
|
|
|
|
|
|
|
| |
There is no point, as far as I can see, of being able to explicitly disable
a SIMD or other ISA extension, because if you are compiling for that particular
CPU the compiler might well choose to insert CPU-specific instructions anyway.
For instance if one is compiling on a P4 but wants to disable SSE2, the
right thing to do is compile for (say) an i686 which ensures that no P4
instructions will be emitted.
|
|
|
|
|
| |
Rename BOTAN_UNALIGNED_LOADSTOR_OK to BOTAN_UNALIGNED_MEMORY_ACCESS_OK
which is somewhat more clear as to the point.
|
|
|
|
|
|
| |
SSE2, SSSE3, NEON, and AltiVec.
Add entries for Intel Atom, POWER6 and POWER7, and the Cortex A8 and A9.
|
| |
|
|
|
|
|
| |
x86 currently. This should be fixed. But it's an improvement over having
to always set it manually, at least.
|
| |
|
|\
| |
| |
| |
| |
| | |
6e8c18515725a70923b34118951252723dd4c29a)
to branch 'net.randombit.botan' (head 77ba4ea5a4be36d6d029bcc852b2271edff0d679)
|
| |\
| | |
| | |
| | |
| | |
| | | |
a101c8c86b755a666c72baf03154230e09e0667e)
to branch 'net.randombit.botan' (head 948905e3872b6f5904686533c6aa87d38ff90a71)
|
| | | |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
I'm not totally happy with this - in particular in all cases the size is a
compile time constant - it would be nice to make use of this via tempalate
metaprogramming. Also for matching endian loads, a straight memcpy would
do the work, which would probably be even faster.
|
| | | |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
change some of the hash functions to use it as low hanging fruit.
Probably could use further optimization (just unrolls x4 currently), but
merely having it as syntax is good as it allows optimizing many functions
at once (eg using SSE2 to do 4-way byteswaps).
|
| | |
| | |
| | |
| | |
| | | |
Document SHA optimizations, AltiVec runtime checking, fixes for cpuid
for both icc and msvc.
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | | |
4fd7eb9630271d3c1dfed21987ef864680d4ce7b)
to branch 'net.randombit.botan.general-simd' (head 91df868149cdc4754d340e6103028acc82182609)
|
| | | | |
|
| | | |
| | | |
| | | |
| | | | |
and also make it stylistically much closer to the standard SHA-1 code.
|
| | | | |
|
| | | |
| | | |
| | | |
| | | | |
the code stylistically, etc)
|
| | | | |
|
| | | |
| | | |
| | | |
| | | |
| | | | |
returns true if they might plausibly work. AltiVec and SSE2 versions call
into CPUID, scalar version always works.
|
| | | | |
|
| | | |
| | | |
| | | |
| | | |
| | | | |
Relies on mfspr emulation/trapping by the kernel, which works on (at least)
Linux and NetBSD.
|
| | | | |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
for unaligned writes is messy as hell.
If writes are batched this is somewhat easier to deal with (somewhat).
|
| | | | |
|
| | |\ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
54d2cc7b00ecd5f41295e147d23ab6d294309f61)
to branch 'net.randombit.botan.general-simd' (head 9cb1b5f00bfefd05cd9555489db34e6d86867aca)
|
| | | |\ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
8fb69dd1c599ada1008c4cab2a6d502cbcc468e0)
to branch 'net.randombit.botan.general-simd' (head c05c9a6d398659891fb8cca170ed514ea7e6476d)
|
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
and Altivec (though Altivec is seemingly slower ATM...)
|
| | | | | | |
|
| | | | | | |
|
| | | | |\ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
bf629b13dd132b263e76a72b7eca0f7e4ab19aac)
to branch 'net.randombit.botan.general-simd' (head f731cff08ff0d04c062742c0c6cfcc18856400ea)
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
on a PowerPC 970 running Gentoo with GCC 4.3.4
Uses a GCC syntax for creating literal values instead of the Motorola
syntax [{1,2,3,4} instead of (1,2,3,4)].
In tests so far, this is much, much slower than either the standard scalar code,
or using the SIMD-in-scalar-registers code. It looks like for whatever reason
GCC is refusing to inline the function:
SIMD_Altivec(__vector unsigned int input) { reg = input; }
and calls it with a branch hundreds of times in each function. I don't know
if this is the entire reason it's slower, but it definitely can't be helping.
The code handles unaligned loads OK but assumes stores are to an aligned address.
This will fail drastically some day, and needs to be fixed to either use scalar
stores, which (most?) PPCs will handle (if slowly), or batch the loads and
stores so we can work across the loads. Considering the code so far loads 4
vectors of data in one go this would probably be a big win (and also for loads,
since instead of doing 8 loads for 4 registers only 5 are needed).
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
of load_le + bswap
|
| | | | | | | |
|