aboutsummaryrefslogtreecommitdiffstats
path: root/modules/alg_ia32
Commit message (Collapse)AuthorAgeFilesLines
* Mostly revert 2f4fd18182d5a75c40cd831e7ee3c314be5c57d6, only keep thelloyd2008-03-1010-10/+10
| | | | | updated dates on files that have actually changed this year. This makes the diff across versions readable again.
* Mass update of the copyright date. Honestly I don't know why I bother,lloyd2008-02-1410-10/+10
| | | | | | | but might as well keep it up to date. And it's easier to do it once with a 'perl -pi' command than to update each file over time. Apologies to anyone looking at diffs.
* Change the syntax from load_on: to load_on (since that way configure.pl canlloyd2007-10-221-1/+1
| | | | | | | | | | | use the generic variable reading routines). Instead of hardcoding the module sets (historically, 'unix', 'beos', and 'win32') into the script, have each module specify which group(s) (if any) it should be considered a member of in its modinfo.txt file. Add a new module set compression which contains (currently) the zlib and bzip2 modules.
* Fix the alg_ia32 module code WRT the recent changes to loadstor.h not beinglloyd2007-10-194-4/+4
| | | | included by bit_ops.h
* Insert a note so the toolchain knows that we are not using an executablelloyd2007-03-121-0/+4
| | | | | | | | | stack. At least SuSE and Gentoo are using a patch for this in their trees, probably others are as well. I still have not had a chance to check the portability aspects of this, especially on Solaris (the only ELF-based x86/amd64 operating system that I know of that does not use the GNU toolchain).
* Provide a more flexible mechanism for specifying which modules are loaded.lloyd2007-03-121-0/+2
| | | | | | | | | | | | | Now three classes are defined: 'request', 'auto', and 'asm_ok'. The 'auto' class is loaded automatically if the platform support matches up with what we are building for (this is the former default). The 'request' mode means it is only loaded if specifically requested by name. The 'asm_ok' module is marked for all modules that use any assembler (including inline assembler). This normally functions like 'auto', unless --debug is passed to configure, in which case it is treated as 'request'. Modules which do not specify a load behavior are given a default of 'request'.
* Add Solaris to the allowed platforms for alg_amd64 and alg_ia32. Untested,lloyd2007-03-041-0/+1
| | | | but should work as Solaris is ELF-based.
* Bump copyright year to 2007lloyd2007-01-2010-10/+10
|
* Rename the mp_muladd source files to mp_muloplloyd2006-12-152-2/+2
|
* Also mark ICC as usable with the alg_ia32 modulelloyd2006-11-241-0/+1
|
* Remove solaris from the list of OK platforms for assembly; Solaris aslloyd2006-11-061-1/+0
| | | | | doesn't seem to like the files for some reason that I don't feel like getting into right now.
* Improve readability a bit with some additional macroslloyd2006-09-261-10/+12
|
* Define the ADD_IMM macro in terms of ADD()lloyd2006-09-261-3/+1
| | | | Remove the CLEAR_CARRY macro, which wasn't being used
* Place the add_file/replace_file/ignore_file markers in the module infolloyd2006-09-031-11/+17
| | | | | | files into blocks; makes a bit more sense, since there are potentially many arguments to each, and the current system was making it difficult to write a generic reader for the files.
* Remove explicit alignment settings before the loops; the loop macrolloyd2006-09-021-5/+2
| | | | | | | already sets alignment. Change the core multiply/add macro a bit; probably not any faster, but a bit cleaner.
* The assembly code is only using 81 words of W, but 84 were being allocated.lloyd2006-08-211-2/+2
|
* Rename some variables for consistency with the SHA-1 asm codelloyd2006-08-212-14/+16
|
* Get ride of an unnecessary register copylloyd2006-08-211-11/+9
|
* Inside the compression function, store the original stack pointer in thelloyd2006-08-212-28/+38
| | | | | W array, and then use %esp to point to the message words. This gives an extra register for temporary usage.
* Let SHA_160::W be resized dynamically; potentially the asm version couldlloyd2006-08-211-0/+8
| | | | use a little extra workspace, this makes that simpler to do.
* Somewhat ineffectual instruction reorderings in the round functionslloyd2006-08-211-28/+28
| | | | | Use EDX instead of EBP for holding the pointer to the digest array at the end of the function.
* Rotate the temporary variable along with the chaining variables; giveslloyd2006-08-211-175/+154
| | | | some further room for optimization.
* Declare mp_bits for alg_ia32, since it touches the MPI codelloyd2006-08-201-0/+2
|
* Fix typolloyd2006-08-191-1/+1
|
* Move Montgomery reduction algorithm into mp_asm.cpplloyd2006-08-192-45/+1
| | | | | | | | | | Move the inner-most loop of Montgomery into bigint_mul_add_words, in mp_muladd.cpp Use bigint_mul_add_words for the inner loop of bigint_simple_multiply Move the compare/subtract at the end of the Montomgery algorithm into bigint_monty_redc
* Align the major jump targetslloyd2006-08-191-15/+6
| | | | | | Remove the comment containing the unoptimized C code Add copyright notice
* Add an x86 assembly implementation of bigint_mul_add_words, which islloyd2006-08-184-3/+134
| | | | the core loop of bigint_monty_redc.
* Add a distinct loop ending for loop-until-equals-immediate; other loopslloyd2006-08-155-7/+13
| | | | ending conditions will be needed later.
* Change the Serpent linear transforms to use the move-and-shift-3 macrolloyd2006-08-151-4/+2
|
* Add a specialized shift instruction for 3 that uses LEA to do a shift andlloyd2006-08-151-0/+1
| | | | move in one instruction.
* Drop the asm-specific serpent.hlloyd2006-08-152-34/+0
|
* Formatting/readability changeslloyd2006-08-151-6/+5
|
* Remove continuation slashes from the last line of some of the macroslloyd2006-08-151-8/+8
|
* Reorder the linear transformations for (nominally) better instructionlloyd2006-08-151-10/+10
| | | | scheduling.
* Have the expansion loop in the key schedule take advantage of freelloyd2006-08-152-12/+17
| | | | registers to load words we will need in advance.
* Remove unused variablelloyd2006-08-151-5/+7
| | | | Collect the external functions into a single extern "C" block
* Implement the Serpent key schedule in assembly as well, so the C++lloyd2006-08-153-122/+98
| | | | | | versions of the Sboxes can be removed. Add some parens inside the asm macros
* Remove an unused functionlloyd2006-08-151-26/+1
|
* Implement decryption in the Serpent assembly codelloyd2006-08-154-207/+386
|
* Add the beginnings of an x96 assembler version of Serpent. Currently onlylloyd2006-08-154-0/+621
| | | | encryption is done in asm, the rest is still in C++
* Was using sha1_core in the END_FUNCTION calls; doesn't make a difference,lloyd2006-08-142-2/+2
| | | | | since right now END_FUNCTION doesn't use its argument, but it looked strange and might cause problems later.
* Get instruction scheduling decently correct. Now running at 110 Mb/s onlloyd2006-08-131-5/+5
| | | | my Athlon, which isn't too far behind OpenSSL
* Load the message words we need in the round before. By going out to thelloyd2006-08-131-54/+133
| | | | | stack to get the address of the message array each time, we can free up a register for the rest of the code inside the rounds.
* Introduce a MSG() macro which returns the desired message wordlloyd2006-08-131-9/+13
|
* Use LEA with the magic constant and A, rather than the magic and thelloyd2006-08-131-9/+9
| | | | boolean; same trick as in MD5. Roughly a 5% speedup.
* Make the temporary implicit, since we always use ESP inside the roundlloyd2006-08-131-47/+49
| | | | functions.
* Add a (working, optimized) x86 version of MD4lloyd2006-08-133-2/+182
|
* Add the memory word and the magic constant using LEA, rather than thelloyd2006-08-131-24/+24
| | | | | boolean function result and the magic constant; the memory word is available sooner, and it seems to produce a major (12%) win.
* Forgot the II() macro in the last checkinlloyd2006-08-131-1/+2
|
* Use the spare register to load the message word, which will potentiallylloyd2006-08-131-3/+7
| | | | help hide some of the memory latency.