| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
| |
The problem in #602 is not the use of mmap but the use of mmap with
MAP_SHARED. Using MAP_PRIVATE creates a CoW mapping just like malloc
or posix_memalign would.
I'm not sure why it took me so long to figure this out ...
|
| |
|
| |
|
| |
|
|
|
|
|
| |
Quite a bit faster than the old version, and with better properties
wrt alignment
|
|\ |
|
| | |
|
| | |
|
|/
|
|
|
| |
It occasionally fails on AppVeyor, probably due to QueryPerformanceCounter
using something other than the hardware cycle counter because <reasons>.
|
| |
|
|
|
|
| |
Lots more of this needed in here
|
| |
|
|
|
|
| |
Here the caller is assumed to have provided a buffer of sufficient size.
|
| |
|
| |
|
| |
|
| |
|
|\ |
|
| |
| |
| |
| | |
Assumed to be 0/1
|
| |
| |
| |
| | |
Also fix xlc macro
|
| |
| |
| |
| | |
Recent XLC is based on clang and has these
|
| |
| |
| |
| |
| |
| | |
XLC 16 changed which macros are used to identify it. Older versions of
XLC didn't work correctly anyway (#1581 #1509 etc), so just drop support
for recognizing those versions.
|
| |
| |
| |
| | |
Was broken by removing inclusion of rotate header
|
| |
| |
| |
| | |
This is sometimes useful when debugging
|
| | |
|
|/ |
|
|
|
|
| |
Add powerpc64le as an alias for the ppc64 build target.
|
| |
|
|
|
|
|
|
|
|
|
| |
Basically, test that it works. Accepts a sequence of alloc+free
operations and verify that each pointer returned does not overlap with
any other outstanding allocation, that the memory returned is zeroed,
and that alignment is respected.
Intended for testing #1800 but no reason not to land this first.
|
|
|
|
|
|
|
|
|
| |
Make the tune interval a build-time configurable instead of hardcoding
it in each source file.
Also use binary search in RFC4880_encode_count instead of linear search.
Fix a bug in Timer
|
| |
|
|
|
|
| |
Combines the priv check and the getenv call on one.
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
As described in #602, using mmap with fork causes problems because
the mmap remains shared in the child instead of being copy-on-write,
then the parent and child stomp on each others memory.
However we really do not need mmap semantics, we just want a block of
memory that is page-aligned, which can be done with posix_memalign
instead. This was added in POSIX.1-2001 and seems to be implemented by
all modern systems.
Closes #602
|
| |
| |
| |
| |
| |
| | |
It has a substantial perf hit and is not necessary. It may not
be really necessary for signatures either but leave that as it,
with a comment explaining.
|
|/
|
|
|
| |
Use ct_is_zero instead of more complicated construction, and
avoid duplicated size check/resize - Data::set_word will handle it.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the long ago when I wrote the Barrett code I must have missed that
Barrett works for any input < 2^2k where k is the word size of the
modulus. Fixing this has several nice effects, it is faster because it
replaces a multiprecision comparison with a single size_t compare, and
now the branch does not reveal information about the input or modulus,
but only their word lengths, which is not considered sensitive.
Fixing this allows reverting the change make in a57ce5a4fd2 and now
RSA signing is even slightly faster than in 2.8, rather than 30% slower.
|
|
|
|
| |
As it would leak if an input was > p^2, or just close to it in size.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Since we are reducing a mod-p integer down to mod-q this would
nearly always use ct_modulo in any case. And, in the case where
Barrett did work, it would reveal that g^k mod p was <= q*q
which would likely be useful for searching for k.
This should actually be slightly faster (if anything) since it avoids
the unnecessary comparison against q*q and jumps directly to
ct_modulo.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Barrett will branch to a different (and slower) algorithm if the input
is larger than the square of the modulus. This branch can be detected
by a side channel.
For RSA we need to compute m % p and m % q to get CRT started. Being
able to detect if m > q*q (assuming q is the smaller prime) allows a
binary search on the secret prime. This attack is blocked by input
blinding, but still seems dangerous. Unfortunately changing to use the
generic const time modulo instead of Barrett introduces a rather
severe performance regression in RSA signing.
In SM2, reduce k-r*x modulo the order before multiplying it with (x-1)^-1.
Otherwise the need for slow modulo vs Barrett leaks information about
k and/or x.
|
|
|
|
|
|
|
|
| |
Was already done in P-256 but not in P-{192,224,384}.
This is a cache-based side channel which would be good to address. It
seems like it would be very difficult to exploit even with perfect
recovery, but crazier things have worked.
|
|
|
|
|
| |
Previously we unpoisoned the input to high_bit but this is no
longer required. But still the output should be unpoisoned.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The test_fuzzers.py script is very slow especially on CI. Add a mode
to the test fuzzers where it will accept many files on the command
line and test each of them in turn. This is 100s of times faster,
as it avoids all overhead from fork/exec.
It has the downside that you can't tell which input caused a crash, so
retain the old mode with --one-at-a-time option for debugging work.
|
|
|
|
|
| |
Coverage is the slowest build, moving it up puts it into the initial
tranche of builds so it finishes before the end of the build.
|
|
|
|
|
|
| |
It is slower to startup and the overall build ends up waiting for these
last 2 builds. By running them in the front of the line they can overlap
with other builds.
|