| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| | |
8cecdc1c3dd5853823fabcb816400dd467b3c04a)
to branch 'net.randombit.botan.c++0x' (head 39a585195a07f18628f6216a276402ed92567cc3)
|
| |
| |
| |
| | |
build magic, name them asm_macr_ARCH.h. Change all including files accordingly.
|
| | |
|
| |
| |
| |
| | |
It will be nice to convert to the range-based for loop once that's available.
|
|\|
| |
| |
| |
| |
| | |
ac888e57b614c623590d79ab615353ad7c76ef68)
to branch 'net.randombit.botan.c++0x' (head 9bf78ed7e2521a328f6db7acbc1cd81b07718230)
|
| | |
|
| | |
|
| |
| |
| |
| | |
which is currently just a stub returning false.
|
| |
| |
| |
| |
| | |
Rename BOTAN_UNALIGNED_LOADSTOR_OK to BOTAN_UNALIGNED_MEMORY_ACCESS_OK
which is somewhat more clear as to the point.
|
|\|
| |
| |
| |
| |
| | |
cead7027e70b68a8b4ae2e5bd8f290066e5ea22a)
to branch 'net.randombit.botan.c++0x' (head 9edbd485060131b695170f5243a100e06e3b0c71)
|
| | |
|
|\ \
| |/
|/|
| |
| |
| | |
2773c2310e8c0a51975987a2dd6c5824c8d43882)
to branch 'net.randombit.botan.c++0x' (head f13cf5d7e89706c882604299b508f356c20aae3a)
|
| |
| |
| |
| | |
(which will go later) which will live in the new time.h
|
| | |
|
| |\
| | |
| | |
| | |
| | |
| | | |
139d6957d20f0b1202e0eacc63cb011588faffde)
to branch 'net.randombit.botan.c++0x' (head c16676fa6c393bc3f46a044755ce525a013380a6)
|
| | |\
| | | |
| | | |
| | | |
| | | |
| | | | |
8a5eb02c2e451fc983f234f7ba2f023f5a7d294f)
to branch 'net.randombit.botan.c++0x' (head e18cd411269e15638df3298d6a4165446e7ca529)
|
| | | |\
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
92e05ab242e4b6998d685961c53700534a673bce)
to branch 'net.randombit.botan.c++0x' (head 27ce37b971ec5cb1f80a9a95b13d5a951b96653b)
|
| | | |\ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
5cadcc57872bef55226579df57349fe09a93d1f5)
to branch 'net.randombit.botan.c++0x' (head d1747f0394aa4442e5b32b9102b830e1a86f0e5a)
|
| | | | |\ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
95eb8083f5884531e5ca0667388f8a6fb6d05c41)
to branch 'net.randombit.botan.c++0x' (head 56e105e678540c8bcafa4d0198c19a9489fbf8d1)
|
| | | | |\ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
5438defd358f82e876917a8bd6d735305ecb0a8e)
to branch 'net.randombit.botan.c++0x' (head cbdb2fd418557add29a536f7bdb6e78db16f725c)
|
| | | | | |\ \ \
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
d6d32791adfa878b6fc0dd3a5b65a665b7bbb549)
to branch 'net.randombit.botan.c++0x' (head 54deb0e078aab8cd91c8fd8819d1e6668fc762da)
|
| | | | | |\ \ \ \
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
6a746ccf1e957dba703e65372050a7bd4d6b117d)
to branch 'net.randombit.botan.c++0x' (head f54bb7b391eb3b71f380a68ddd460debdc31545d)
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
be much cleaner, though I am looking forward to the new for syntax which
will simplify a lot of these uses further.
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
in the source).
|
| | | | | | | | | | |
|
| | | | | | | | | | |
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
change some of the hash functions to use it as low hanging fruit.
Probably could use further optimization (just unrolls x4 currently), but
merely having it as syntax is good as it allows optimizing many functions
at once (eg using SSE2 to do 4-way byteswaps).
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Document SHA optimizations, AltiVec runtime checking, fixes for cpuid
for both icc and msvc.
|
| | | | | | | | | | |
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
returns true if they might plausibly work. AltiVec and SSE2 versions call
into CPUID, scalar version always works.
|
| | | | | | | | | | |
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Relies on mfspr emulation/trapping by the kernel, which works on (at least)
Linux and NetBSD.
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
for unaligned writes is messy as hell.
If writes are batched this is somewhat easier to deal with (somewhat).
|
| | | | | | | | | | |
|
|\ \ \ \ \ \ \ \ \ \
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | | |
8fb69dd1c599ada1008c4cab2a6d502cbcc468e0)
to branch 'net.randombit.botan.general-simd' (head c05c9a6d398659891fb8cca170ed514ea7e6476d)
|
| | | | | | | | | | | |
|
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | | |
on a PowerPC 970 running Gentoo with GCC 4.3.4
Uses a GCC syntax for creating literal values instead of the Motorola
syntax [{1,2,3,4} instead of (1,2,3,4)].
In tests so far, this is much, much slower than either the standard scalar code,
or using the SIMD-in-scalar-registers code. It looks like for whatever reason
GCC is refusing to inline the function:
SIMD_Altivec(__vector unsigned int input) { reg = input; }
and calls it with a branch hundreds of times in each function. I don't know
if this is the entire reason it's slower, but it definitely can't be helping.
The code handles unaligned loads OK but assumes stores are to an aligned address.
This will fail drastically some day, and needs to be fixed to either use scalar
stores, which (most?) PPCs will handle (if slowly), or batch the loads and
stores so we can work across the loads. Considering the code so far loads 4
vectors of data in one go this would probably be a big win (and also for loads,
since instead of doing 8 loads for 4 registers only 5 are needed).
|
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | | |
of load_le + bswap
|
| | | | | | | | | | | |
|
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | | |
operations.
Also add a pure scalar code version.
Convert Serpent to use this new interface, and add an implementation of
XTEA in SIMD.
The wrappers plus the scalar version allow SIMD-ish code to work on all
platforms. This is often a win due to better ILP being visible to the
processor (as with the recent XTEA optimizations). Only real danger is
register starvation, mostly an issue on x86 these days. So it may (or may
not) be a win to consolidate the standard C++ versions and the SIMD versions
together.
Future work:
- Add AltiVec/VMX version
- Maybe also for ARM's NEON extension? Less pressing, I would think.
- Convert SHA-1 code to use SIMD_32
- Add XTEA SIMD decryption (currently only encrypt)
- Change SSE2 engine to SIMD_engine
- Modify configure.py to set BOTAN_TARGET_CPU_HAS_[SSE2|ALTIVEC|NEON|XXX] macros
|
|/ / / / / / / / / /
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Pretty much useless and unused, except for listing the module names in
build.h and the short versions totally suffice for that.
|
| | | | | | | | | | |
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
since it passes signed ints for whatever reason.
Ensure CALL_CPUID is always defined (previously, it would not be if on an x86
but compiled with something other than GCC, ICC, VC++).
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Add new load options that are passed a number of variables by reference,
setting them all at once. Will allow for batching operations (eg using
SIMD operations to do 128-bit wide bswaps) for future optimizations.
|
|/ / / / / / / / / |
|
| | | | | | | | | |
|
|/ / / / / / / / |
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
the prefetch is called for each block of input, and so a total of
(4096+256)/64 = 68 prefetches are executed for each block. This reduces
performance of iterative modes dramatically.
I'm not sure what the right approach for dealing with this is.
|
| | | | | | | | |
|
| | | | | | | | |
|