Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Move bigint_simple_mul into mp_mul.cpp, since that is the only place it | lloyd | 2006-08-19 | 4 | -17/+26 |
| | | | | | | was used. Make a variant of bigint_simple_mul, bigint_simple_sqr, for mp_sqr.cpp | ||||
* | Fix typo | lloyd | 2006-08-19 | 1 | -1/+1 |
| | |||||
* | Delete trailing whitespace | lloyd | 2006-08-19 | 4 | -5/+5 |
| | |||||
* | Move Montgomery reduction algorithm into mp_asm.cpp | lloyd | 2006-08-19 | 8 | -110/+69 |
| | | | | | | | | | | Move the inner-most loop of Montgomery into bigint_mul_add_words, in mp_muladd.cpp Use bigint_mul_add_words for the inner loop of bigint_simple_multiply Move the compare/subtract at the end of the Montomgery algorithm into bigint_monty_redc | ||||
* | Don't test Skipjack at startup - it's really not that important, and | lloyd | 2006-08-19 | 1 | -8/+0 |
| | | | | | running the test means the algorithm prototype is loaded into memory when it will probably never be used later. | ||||
* | Remove trailing whitespace | lloyd | 2006-08-19 | 2 | -2/+2 |
| | |||||
* | Align the major jump targets | lloyd | 2006-08-19 | 1 | -15/+6 |
| | | | | | | Remove the comment containing the unoptimized C code Add copyright notice | ||||
* | Add an x86 assembly implementation of bigint_mul_add_words, which is | lloyd | 2006-08-18 | 4 | -3/+134 |
| | | | | the core loop of bigint_monty_redc. | ||||
* | Simplify the implementation of bigint_divop | lloyd | 2006-08-18 | 1 | -6/+8 |
| | |||||
* | Move montgomery_reduce to after choose_window_bits for better consistency | lloyd | 2006-08-17 | 1 | -18/+18 |
| | | | | between the Montgomery and fixed-window exponentiators. | ||||
* | Create a slightly higher level wrapper around bigint_monty_redc, save a | lloyd | 2006-08-17 | 1 | -18/+13 |
| | | | | few lines. | ||||
* | Remove whitespace | lloyd | 2006-08-17 | 1 | -3/+0 |
| | |||||
* | Fix the es_capi module; was not using the new global_config() accessor | lloyd | 2006-08-17 | 1 | -1/+1 |
| | |||||
* | Inline the call to word_add in bigint_monty_redc - the carry in was | lloyd | 2006-08-17 | 1 | -3/+3 |
| | | | | | | always zero, so this is both a bit more efficient and more readable. It won't be able to take advantage of asm implementations of word_add, but the benefit from that with a single call per loop is small anyway. | ||||
* | Move bigint_monty_redc to its own file; profiling indicates that this | lloyd | 2006-08-17 | 2 | -33/+49 |
| | | | | | single function is using 30+% of the runtime during RSA operations, making it a strong candidate for implementation in assembly. | ||||
* | Split Montgomery reduction into two functions, the core algorithm linked | lloyd | 2006-08-16 | 3 | -6/+15 |
| | | | | | | as C (for replacing by asm later), and another that performs a subtract if needed (inside powm_mnt.cpp). That way an asm version of the Montgomery algorithm won't have to deal with calling other functions. | ||||
* | Add a distinct loop ending for loop-until-equals-immediate; other loops | lloyd | 2006-08-15 | 5 | -7/+13 |
| | | | | ending conditions will be needed later. | ||||
* | Remove some variables we didn't really need in the key schedule | lloyd | 2006-08-15 | 1 | -6/+4 |
| | |||||
* | Version bump in the configure script | lloyd | 2006-08-15 | 2 | -2/+2 |
| | |||||
* | Change the Serpent linear transforms to use the move-and-shift-3 macro | lloyd | 2006-08-15 | 1 | -4/+2 |
| | |||||
* | Add a specialized shift instruction for 3 that uses LEA to do a shift and | lloyd | 2006-08-15 | 1 | -0/+1 |
| | | | | move in one instruction. | ||||
* | Drop the asm-specific serpent.h | lloyd | 2006-08-15 | 2 | -34/+0 |
| | |||||
* | Formatting/readability changes | lloyd | 2006-08-15 | 1 | -6/+5 |
| | |||||
* | Replace Serpent's key_xor function with a macro, so the header can be | lloyd | 2006-08-15 | 2 | -7/+5 |
| | | | | shared between the C++ and assembly versions. | ||||
* | Remove continuation slashes from the last line of some of the macros | lloyd | 2006-08-15 | 1 | -8/+8 |
| | |||||
* | Reorder the linear transformations for (nominally) better instruction | lloyd | 2006-08-15 | 1 | -10/+10 |
| | | | | scheduling. | ||||
* | Have the expansion loop in the key schedule take advantage of free | lloyd | 2006-08-15 | 2 | -12/+17 |
| | | | | registers to load words we will need in advance. | ||||
* | Remove unused variable | lloyd | 2006-08-15 | 1 | -5/+7 |
| | | | | Collect the external functions into a single extern "C" block | ||||
* | Implement the Serpent key schedule in assembly as well, so the C++ | lloyd | 2006-08-15 | 3 | -122/+98 |
| | | | | | | versions of the Sboxes can be removed. Add some parens inside the asm macros | ||||
* | Remove an unused function | lloyd | 2006-08-15 | 1 | -26/+1 |
| | |||||
* | Implement decryption in the Serpent assembly code | lloyd | 2006-08-15 | 4 | -207/+386 |
| | |||||
* | Add the beginnings of an x96 assembler version of Serpent. Currently only | lloyd | 2006-08-15 | 4 | -0/+621 |
| | | | | encryption is done in asm, the rest is still in C++ | ||||
* | Was using sha1_core in the END_FUNCTION calls; doesn't make a difference, | lloyd | 2006-08-14 | 2 | -2/+2 |
| | | | | | since right now END_FUNCTION doesn't use its argument, but it looked strange and might cause problems later. | ||||
* | Changelog updates1.5.10 | lloyd | 2006-08-13 | 1 | -2/+8 |
| | |||||
* | Get instruction scheduling decently correct. Now running at 110 Mb/s on | lloyd | 2006-08-13 | 1 | -5/+5 |
| | | | | my Athlon, which isn't too far behind OpenSSL | ||||
* | Load the message words we need in the round before. By going out to the | lloyd | 2006-08-13 | 1 | -54/+133 |
| | | | | | stack to get the address of the message array each time, we can free up a register for the rest of the code inside the rounds. | ||||
* | Introduce a MSG() macro which returns the desired message word | lloyd | 2006-08-13 | 1 | -9/+13 |
| | |||||
* | Use LEA with the magic constant and A, rather than the magic and the | lloyd | 2006-08-13 | 1 | -9/+9 |
| | | | | boolean; same trick as in MD5. Roughly a 5% speedup. | ||||
* | Make the temporary implicit, since we always use ESP inside the round | lloyd | 2006-08-13 | 1 | -47/+49 |
| | | | | functions. | ||||
* | Add a (working, optimized) x86 version of MD4 | lloyd | 2006-08-13 | 3 | -2/+182 |
| | |||||
* | Add the memory word and the magic constant using LEA, rather than the | lloyd | 2006-08-13 | 1 | -24/+24 |
| | | | | | boolean function result and the magic constant; the memory word is available sooner, and it seems to produce a major (12%) win. | ||||
* | Forgot the II() macro in the last checkin | lloyd | 2006-08-13 | 1 | -1/+2 |
| | |||||
* | Use the spare register to load the message word, which will potentially | lloyd | 2006-08-13 | 1 | -3/+7 |
| | | | | help hide some of the memory latency. | ||||
* | Make the temporary implicit, since we were always passing the same register | lloyd | 2006-08-13 | 1 | -106/+108 |
| | |||||
* | Cleanups, and move the initial memory access to the beginning of each | lloyd | 2006-08-13 | 2 | -52/+77 |
| | | | | MD5 round in an attempt to hide the latency a bit | ||||
* | Respect the --seconds command line argument with --bench-algo | lloyd | 2006-08-13 | 2 | -4/+4 |
| | |||||
* | Add an x86 assembly MD5 implementation; works, but needs optimization | lloyd | 2006-08-13 | 3 | -0/+176 |
| | |||||
* | Add a macro for the not instruction | lloyd | 2006-08-13 | 1 | -0/+1 |
| | |||||
* | Minor formatting changes, reorder one instruction | lloyd | 2006-08-13 | 1 | -3/+1 |
| | |||||
* | Add checks for MD4, MD5, and SHA-1 for zero-length inputs | lloyd | 2006-08-13 | 1 | -0/+3 |
| |