aboutsummaryrefslogtreecommitdiffstats
path: root/src/block/serpent_sse2/serp_sse2.cpp
diff options
context:
space:
mode:
authorlloyd <[email protected]>2009-10-28 19:50:06 +0000
committerlloyd <[email protected]>2009-10-28 19:50:06 +0000
commit185d85338562627aa4800436a3fe6efa11886351 (patch)
treea892d454d5d88e008624b60e3e88c037f312d770 /src/block/serpent_sse2/serp_sse2.cpp
parentf5d4cf7509011669c25746e3b4c681b5ebfede79 (diff)
Add an AltiVec SIMD_32 implementation. Tested and works for Serpent and XTEA
on a PowerPC 970 running Gentoo with GCC 4.3.4 Uses a GCC syntax for creating literal values instead of the Motorola syntax [{1,2,3,4} instead of (1,2,3,4)]. In tests so far, this is much, much slower than either the standard scalar code, or using the SIMD-in-scalar-registers code. It looks like for whatever reason GCC is refusing to inline the function: SIMD_Altivec(__vector unsigned int input) { reg = input; } and calls it with a branch hundreds of times in each function. I don't know if this is the entire reason it's slower, but it definitely can't be helping. The code handles unaligned loads OK but assumes stores are to an aligned address. This will fail drastically some day, and needs to be fixed to either use scalar stores, which (most?) PPCs will handle (if slowly), or batch the loads and stores so we can work across the loads. Considering the code so far loads 4 vectors of data in one go this would probably be a big win (and also for loads, since instead of doing 8 loads for 4 registers only 5 are needed).
Diffstat (limited to 'src/block/serpent_sse2/serp_sse2.cpp')
0 files changed, 0 insertions, 0 deletions