| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
As described in #602, using mmap with fork causes problems because
the mmap remains shared in the child instead of being copy-on-write,
then the parent and child stomp on each others memory.
However we really do not need mmap semantics, we just want a block of
memory that is page-aligned, which can be done with posix_memalign
instead. This was added in POSIX.1-2001 and seems to be implemented by
all modern systems.
Closes #602
|
| |
| |
| |
| |
| |
| | |
It has a substantial perf hit and is not necessary. It may not
be really necessary for signatures either but leave that as it,
with a comment explaining.
|
|/
|
|
|
| |
Use ct_is_zero instead of more complicated construction, and
avoid duplicated size check/resize - Data::set_word will handle it.
|
| |
|
| |
|
| |
|
|\ |
|
|/
|
|
|
|
|
|
|
|
|
|
| |
In the long ago when I wrote the Barrett code I must have missed that
Barrett works for any input < 2^2k where k is the word size of the
modulus. Fixing this has several nice effects, it is faster because it
replaces a multiprecision comparison with a single size_t compare, and
now the branch does not reveal information about the input or modulus,
but only their word lengths, which is not considered sensitive.
Fixing this allows reverting the change make in a57ce5a4fd2 and now
RSA signing is even slightly faster than in 2.8, rather than 30% slower.
|
| |
|
|\ |
|
| |
| |
| |
| | |
As it would leak if an input was > p^2, or just close to it in size.
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Since we are reducing a mod-p integer down to mod-q this would
nearly always use ct_modulo in any case. And, in the case where
Barrett did work, it would reveal that g^k mod p was <= q*q
which would likely be useful for searching for k.
This should actually be slightly faster (if anything) since it avoids
the unnecessary comparison against q*q and jumps directly to
ct_modulo.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Barrett will branch to a different (and slower) algorithm if the input
is larger than the square of the modulus. This branch can be detected
by a side channel.
For RSA we need to compute m % p and m % q to get CRT started. Being
able to detect if m > q*q (assuming q is the smaller prime) allows a
binary search on the secret prime. This attack is blocked by input
blinding, but still seems dangerous. Unfortunately changing to use the
generic const time modulo instead of Barrett introduces a rather
severe performance regression in RSA signing.
In SM2, reduce k-r*x modulo the order before multiplying it with (x-1)^-1.
Otherwise the need for slow modulo vs Barrett leaks information about
k and/or x.
|
| |
| |
| |
| |
| |
| |
| |
| | |
Was already done in P-256 but not in P-{192,224,384}.
This is a cache-based side channel which would be good to address. It
seems like it would be very difficult to exploit even with perfect
recovery, but crazier things have worked.
|
|/
|
|
|
| |
Previously we unpoisoned the input to high_bit but this is no
longer required. But still the output should be unpoisoned.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The test_fuzzers.py script is very slow especially on CI. Add a mode
to the test fuzzers where it will accept many files on the command
line and test each of them in turn. This is 100s of times faster,
as it avoids all overhead from fork/exec.
It has the downside that you can't tell which input caused a crash, so
retain the old mode with --one-at-a-time option for debugging work.
|
|
|
|
|
| |
Coverage is the slowest build, moving it up puts it into the initial
tranche of builds so it finishes before the end of the build.
|
|
|
|
|
|
| |
It is slower to startup and the overall build ends up waiting for these
last 2 builds. By running them in the front of the line they can overlap
with other builds.
|
|
|
|
|
| |
Running them all takes a long time, especially in CI, and doesn't
really add much.
|
|
|
|
| |
The cache size increases will continue until hit rate improves.
|
|
|
|
|
|
|
| |
Undocumented? side effect of a small git pull depth - if more than N
new commits are pushed to master while an earlier build is running,
the old build starts failing, as when CI does the pull it does not
find the commit it is building within the checked out tree.
|
| |
|
|
|
|
| |
Actual bug, flagged by Coverity
|
|
|
|
| |
Flagged by Coverity
|
|
|
|
|
|
|
| |
This skips putting the git revision in the build.h header. This value
changing every time means we effectively disable ccache's direct mode
(which is faster than preprocessor mode) and also prevent any caching
of the amalgamation file (since version.cpp expands the macro).
|
| |
|
|
|
|
| |
Even 600M is not sufficient for the coverage build
|
|
|
|
|
| |
Using phrase "timestamp" makes it sound like it has some relation
to wall clock which it does not.
|
| |
|
|
|
|
| |
No reason for these to be inlined
|
|
|
|
|
|
| |
Only used in one place, where const time doesn't matter, but can't hurt.
Remove low_bit, can be replaced by ctz.
|
|
|
|
|
|
| |
Reading the system timestamp first causes every event to get a few
hundred cycles tacked onto it. Only mattered when the thing being
tested was very fast.
|
|
|
|
| |
Still insufficient for debug builds
|
|\ |
|
| | |
|
| | |
|
| | |
|
|/
|
|
|
|
|
| |
They get compiled as const-time on x86-64 with GCC but I don't think
this can be totally relied on. But it is anyway an improvement.
And, faster, because we compute it recursively
|
|
|
|
|
| |
With compression disabled, the cache is too small for builds that
use debug info, and causes 100% miss rate.
|
|
|
|
| |
I couldn't get anything to link with PGI, but at least it builds again.
|
| |
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The decoding leaked some information about the delimiter index
due to copying only exactly input_len - delim_idx bytes. I can't
articulate a specific attack that would work here, but it is easy
enough to fix this to run in const time instead, where all bytes
are accessed regardless of the length of the padding.
CT::copy_out is O(n^2) and thus terrible, but in practice it is only
used with RSA decryption, and multiplication is also O(n^2) with the
modulus size, so a few extra cycles here doesn't matter much.
|
|\ \ |
|
|/ /
| |
| |
| |
| |
| |
| | |
It was only needed for one case which is easily hardcoded. Include
rotate.h in all the source files that actually use rotr/rotl but
implicitly picked it up via loadstor.h -> bswap.h -> rotate.h include
chain.
|