aboutsummaryrefslogtreecommitdiffstats
path: root/modules
Commit message (Collapse)AuthorAgeFilesLines
* The assembly code is only using 81 words of W, but 84 were being allocated.lloyd2006-08-211-2/+2
|
* Remove a check for GCC in the source; that's what the module compilerlloyd2006-08-211-4/+0
| | | | restrictions are for.
* Rename some variables for consistency with the SHA-1 asm codelloyd2006-08-212-14/+16
|
* Get ride of an unnecessary register copylloyd2006-08-211-11/+9
|
* Inside the compression function, store the original stack pointer in thelloyd2006-08-212-28/+38
| | | | | W array, and then use %esp to point to the message words. This gives an extra register for temporary usage.
* Let SHA_160::W be resized dynamically; potentially the asm version couldlloyd2006-08-211-0/+8
| | | | use a little extra workspace, this makes that simpler to do.
* Somewhat ineffectual instruction reorderings in the round functionslloyd2006-08-211-28/+28
| | | | | Use EDX instead of EBP for holding the pointer to the digest array at the end of the function.
* Rotate the temporary variable along with the chaining variables; giveslloyd2006-08-211-175/+154
| | | | some further room for optimization.
* Declare mp_bits for alg_ia32, since it touches the MPI codelloyd2006-08-201-0/+2
|
* Fix typolloyd2006-08-191-1/+1
|
* Move Montgomery reduction algorithm into mp_asm.cpplloyd2006-08-192-45/+1
| | | | | | | | | | Move the inner-most loop of Montgomery into bigint_mul_add_words, in mp_muladd.cpp Use bigint_mul_add_words for the inner loop of bigint_simple_multiply Move the compare/subtract at the end of the Montomgery algorithm into bigint_monty_redc
* Align the major jump targetslloyd2006-08-191-15/+6
| | | | | | Remove the comment containing the unoptimized C code Add copyright notice
* Add an x86 assembly implementation of bigint_mul_add_words, which islloyd2006-08-184-3/+134
| | | | the core loop of bigint_monty_redc.
* Fix the es_capi module; was not using the new global_config() accessorlloyd2006-08-171-1/+1
|
* Add a distinct loop ending for loop-until-equals-immediate; other loopslloyd2006-08-155-7/+13
| | | | ending conditions will be needed later.
* Change the Serpent linear transforms to use the move-and-shift-3 macrolloyd2006-08-151-4/+2
|
* Add a specialized shift instruction for 3 that uses LEA to do a shift andlloyd2006-08-151-0/+1
| | | | move in one instruction.
* Drop the asm-specific serpent.hlloyd2006-08-152-34/+0
|
* Formatting/readability changeslloyd2006-08-151-6/+5
|
* Remove continuation slashes from the last line of some of the macroslloyd2006-08-151-8/+8
|
* Reorder the linear transformations for (nominally) better instructionlloyd2006-08-151-10/+10
| | | | scheduling.
* Have the expansion loop in the key schedule take advantage of freelloyd2006-08-152-12/+17
| | | | registers to load words we will need in advance.
* Remove unused variablelloyd2006-08-151-5/+7
| | | | Collect the external functions into a single extern "C" block
* Implement the Serpent key schedule in assembly as well, so the C++lloyd2006-08-153-122/+98
| | | | | | versions of the Sboxes can be removed. Add some parens inside the asm macros
* Remove an unused functionlloyd2006-08-151-26/+1
|
* Implement decryption in the Serpent assembly codelloyd2006-08-154-207/+386
|
* Add the beginnings of an x96 assembler version of Serpent. Currently onlylloyd2006-08-154-0/+621
| | | | encryption is done in asm, the rest is still in C++
* Was using sha1_core in the END_FUNCTION calls; doesn't make a difference,lloyd2006-08-142-2/+2
| | | | | since right now END_FUNCTION doesn't use its argument, but it looked strange and might cause problems later.
* Get instruction scheduling decently correct. Now running at 110 Mb/s onlloyd2006-08-131-5/+5
| | | | my Athlon, which isn't too far behind OpenSSL
* Load the message words we need in the round before. By going out to thelloyd2006-08-131-54/+133
| | | | | stack to get the address of the message array each time, we can free up a register for the rest of the code inside the rounds.
* Introduce a MSG() macro which returns the desired message wordlloyd2006-08-131-9/+13
|
* Use LEA with the magic constant and A, rather than the magic and thelloyd2006-08-131-9/+9
| | | | boolean; same trick as in MD5. Roughly a 5% speedup.
* Make the temporary implicit, since we always use ESP inside the roundlloyd2006-08-131-47/+49
| | | | functions.
* Add a (working, optimized) x86 version of MD4lloyd2006-08-133-2/+182
|
* Add the memory word and the magic constant using LEA, rather than thelloyd2006-08-131-24/+24
| | | | | boolean function result and the magic constant; the memory word is available sooner, and it seems to produce a major (12%) win.
* Forgot the II() macro in the last checkinlloyd2006-08-131-1/+2
|
* Use the spare register to load the message word, which will potentiallylloyd2006-08-131-3/+7
| | | | help hide some of the memory latency.
* Make the temporary implicit, since we were always passing the same registerlloyd2006-08-131-106/+108
|
* Cleanups, and move the initial memory access to the beginning of eachlloyd2006-08-132-52/+77
| | | | MD5 round in an attempt to hide the latency a bit
* Add an x86 assembly MD5 implementation; works, but needs optimizationlloyd2006-08-133-0/+176
|
* Add a macro for the not instructionlloyd2006-08-131-0/+1
|
* Minor formatting changes, reorder one instructionlloyd2006-08-131-3/+1
|
* Clear the W buffer inside the SHA_160::clear() functionslloyd2006-08-131-0/+1
|
* Remove a block of disabled code that was just for debug purposeslloyd2006-08-131-8/+0
|
* Clean up the macros, add comment headers, add a couple of helper macroslloyd2006-08-132-28/+63
| | | | | | for spilling/restoring registers. Reorder some instructions for slightly better scheduling across rounds
* Drop the AES asm code for nowlloyd2006-08-133-192/+0
|
* Update sha1core.S to match the macro updates in the last checkin. Renamelloyd2006-08-131-63/+63
| | | | some variables for easier reading.
* A few macro fixeslloyd2006-08-131-7/+10
|
* Add stub versions of AES assemblerlloyd2006-08-133-0/+193
|
* Rename sha_x86 module to alg_ia32; there will probably be other algorithmslloyd2006-08-134-0/+0
| | | | going in here (at least eventually, and potentially soon-ish)