| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the parameters of the key length. Instead define a new function which
returns a simple object which contains this information.
This definitely breaks backwards compatability, though only with code
that directly manipulates low level objects like BlockCipher*s
directly, which is probably relatively rare.
Also remove some deprecated accessor functions from lookup.h. It turns
out block_size_of and output_size_of are being used in the TLS code; I
need to remove them from there before I can delete these entirely.
Really that didn't make much sense, because they assumed all
implementations of a particular algorithm will have the same
specifications, which is definitely not necessarily true, especially
WRT key length. It is much safer (and probably simpler) to first
retrieve an instance of the actual object you are going to use and
then ask it directly.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
parameters are as well. So make them template paramters.
The sole exception was AES, because you could either initialize AES
with a fixed key length, in which case it would only be that specific
key length, or not, in which case it would support any valid AES key
size. This is removed in this checkin; you have to specifically ask for
AES-128, AES-192, or AES-256, depending on which one you want.
This is probably actually a good thing, because every implementation
other than the base one (SSSE3, AES-NI, OpenSSL) did not support
"AES", only the versions with specific fixed key sizes. So forcing
the user to ask for the one they want ensures they get the ones
that are faster and/or safer.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
sets the block size statically and also creates an enum with the
size. Use the enum instead of calling block_size() where possible,
since that uses two virtual function calls per block which is quite
unfortunate. The real advantages here as compared to the previous
version which kept the block size as a per-object u32bit:
- The compiler can inline the constant as an immediate operand
(previously it would load the value via an indirection on this)
- Removes 32 bits per object overhead (except in cases with actually
variable block sizes, which are very few and rarely used)
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
compute the inverses mod 65537 exposed a timing vulnerability. Avoid
this by instead using exponentiation, which takes constant time (up to
variability in the multiplication operation, at least).
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the initial/default length of the array, update all users to instead
pass the value to the constructor.
This is a old vestigal thing from a class (SecureBuffer) that used
this compile-time constant in order to store the values in an
array. However this was changed way back in 2002 to use the same
allocator hooks as the rest of the containers, so the only advantage
to using the length field was that the initial length was set and
didn't have to be set in the constructor which was midly convenient.
However this directly conflicts with the desire to be able to
(eventually) use std::vector with a custom allocator, since of course
vector doesn't support this.
Fortunately almost all of the uses are in classes which have only a
single constructor, so there is little to no duplication by instead
initializing the size in the constructor.
|
| |
|
| |
|
|
|
|
|
| |
representation (rather than in an interator context), instead use &buf[0],
which works for both MemoryRegion and std::vector
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
harmonising MemoryRegion with std::vector:
The MemoryRegion::clear() function would zeroise the buffer, but keep
the memory allocated and the size unchanged. This is very different
from STL's clear(), which is basically the equivalent to what is
called destroy() in MemoryRegion. So to be able to replace MemoryRegion
with a std::vector, we have to rename destroy() to clear() and we have
to expose the current functionality of clear() in some other way, since
vector doesn't support this operation. Do so by adding a global function
named zeroise() which takes a MemoryRegion which is zeroed. Remove clear()
to ensure all callers are updated.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
rotations in the code. This reduces the number of cache lines
potentially accessed in the first round from 64 to 16 (assuming 64
byte cache lines). On average, about 10 cache lines will actually be
accessed, assuming a uniform distribution of the inputs, so there
definitely is still a timing channel here, just a somewhat smaller
one.
I experimented with using the 256 element table for all rounds but it
reduced performance significantly and I'm not sure if the benefit is
worth the cost or not.
|
| |
|
| |
|
|
|
|
| |
supports epi64x in 64-bit mode.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
reasons, Intel C++ rejects
const __m128i foo = _mm_set_epi64x(...)
though it will accept if you use one of the _mm_set1 variants.
And Visual C++ doesn't know about _mm_set_epi64x() in 32-bit mode for
similarly dumb reasons - it works fine compiling for 64 bit but for
whatever reason they don't offer this function when compiling as 32
bit. Unfortunately there isn't a good way to specify it's OK with a
particular compiler with one arch but not another, so just disable it
globally for the time being. The workaround for VC++ is probably to
use _mm_set_epi32 and break up the input values into 32 bit chunks.
ICC is a lost cause I fear.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
constant time and on a Nehalem is significantly faster than the table
based version. This implementation technique was invented by Mike
Hamburg and described in a paper in CHES 2009 "Accelerating AES with
Vector Permute Instructions". This code is basically a translation of
his public domain x86-64 assembly code into intrinsics.
Todo: Adding support for AES-192 and AES-256; this just requires
implementing the key schedules.
Currently only tested on an i7 with GCC (32 and 64 bit code);
testing/optimization on 32-bit processors with SSSE3 like the Atom,
and with Visual C++ and other compilers, are also todos.
|
|
|
|
| |
fine with latest SVN.
|
|
|
|
| |
process
|
| |
|
|
|
|
|
| |
for getting access to the key schedule, instead of giving the key
schedule protected status, which is much harder tu audit.
|
|
|
|
|
| |
protected accessor functions for get and set. Set is needed by the x86
version since it implements the key schedule directly.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|