| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Both about 33% faster on Skylake
|
| |
|
|
|
|
|
|
|
|
|
| |
Boost doesn't buy us anything here since we need to maintain
Win32 and POSIX implementations for non-Boost builds, and Boost
only supports those two APIs anyway.
MSVC's implementation of std::filesystem does not help for similar
reasons, as we have to maintain a Win32 version for MinGW.
|
|
|
|
| |
This lets us avoid some warnings under VC++ 2017
|
|
|
|
| |
No real bugs, but pointed out some odd constructs and duplicated logic
|
|\ |
|
| |
| |
| |
| |
| | |
No resources to free with actual OS features but might be
of use for later.
|
| |
| |
| |
| | |
For now only used by the TLS server.
|
| | |
|
|/ |
|
| |
|
|
|
|
|
|
|
|
| |
The problem in #602 is not the use of mmap but the use of mmap with
MAP_SHARED. Using MAP_PRIVATE creates a CoW mapping just like malloc
or posix_memalign would.
I'm not sure why it took me so long to figure this out ...
|
| |
|
| |
|
|
|
|
|
| |
Quite a bit faster than the old version, and with better properties
wrt alignment
|
|
|
|
| |
Lots more of this needed in here
|
| |
|
|
|
|
| |
Here the caller is assumed to have provided a buffer of sufficient size.
|
| |
|
|\ |
|
| |
| |
| |
| | |
Assumed to be 0/1
|
| |
| |
| |
| | |
Also fix xlc macro
|
| |
| |
| |
| | |
Recent XLC is based on clang and has these
|
| |
| |
| |
| |
| |
| | |
XLC 16 changed which macros are used to identify it. Older versions of
XLC didn't work correctly anyway (#1581 #1509 etc), so just drop support
for recognizing those versions.
|
| |
| |
| |
| | |
Was broken by removing inclusion of rotate header
|
|/ |
|
|
|
|
|
|
|
|
|
| |
Basically, test that it works. Accepts a sequence of alloc+free
operations and verify that each pointer returned does not overlap with
any other outstanding allocation, that the memory returned is zeroed,
and that alignment is respected.
Intended for testing #1800 but no reason not to land this first.
|
|
|
|
|
|
|
|
|
| |
Make the tune interval a build-time configurable instead of hardcoding
it in each source file.
Also use binary search in RFC4880_encode_count instead of linear search.
Fix a bug in Timer
|
| |
|
|
|
|
| |
Combines the priv check and the getenv call on one.
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
As described in #602, using mmap with fork causes problems because
the mmap remains shared in the child instead of being copy-on-write,
then the parent and child stomp on each others memory.
However we really do not need mmap semantics, we just want a block of
memory that is page-aligned, which can be done with posix_memalign
instead. This was added in POSIX.1-2001 and seems to be implemented by
all modern systems.
Closes #602
|
| |
| |
| |
| |
| |
| | |
It has a substantial perf hit and is not necessary. It may not
be really necessary for signatures either but leave that as it,
with a comment explaining.
|
|/
|
|
|
| |
Use ct_is_zero instead of more complicated construction, and
avoid duplicated size check/resize - Data::set_word will handle it.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the long ago when I wrote the Barrett code I must have missed that
Barrett works for any input < 2^2k where k is the word size of the
modulus. Fixing this has several nice effects, it is faster because it
replaces a multiprecision comparison with a single size_t compare, and
now the branch does not reveal information about the input or modulus,
but only their word lengths, which is not considered sensitive.
Fixing this allows reverting the change make in a57ce5a4fd2 and now
RSA signing is even slightly faster than in 2.8, rather than 30% slower.
|
|
|
|
| |
As it would leak if an input was > p^2, or just close to it in size.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Since we are reducing a mod-p integer down to mod-q this would
nearly always use ct_modulo in any case. And, in the case where
Barrett did work, it would reveal that g^k mod p was <= q*q
which would likely be useful for searching for k.
This should actually be slightly faster (if anything) since it avoids
the unnecessary comparison against q*q and jumps directly to
ct_modulo.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Barrett will branch to a different (and slower) algorithm if the input
is larger than the square of the modulus. This branch can be detected
by a side channel.
For RSA we need to compute m % p and m % q to get CRT started. Being
able to detect if m > q*q (assuming q is the smaller prime) allows a
binary search on the secret prime. This attack is blocked by input
blinding, but still seems dangerous. Unfortunately changing to use the
generic const time modulo instead of Barrett introduces a rather
severe performance regression in RSA signing.
In SM2, reduce k-r*x modulo the order before multiplying it with (x-1)^-1.
Otherwise the need for slow modulo vs Barrett leaks information about
k and/or x.
|
|
|
|
|
|
|
|
| |
Was already done in P-256 but not in P-{192,224,384}.
This is a cache-based side channel which would be good to address. It
seems like it would be very difficult to exploit even with perfect
recovery, but crazier things have worked.
|
|
|
|
|
| |
Previously we unpoisoned the input to high_bit but this is no
longer required. But still the output should be unpoisoned.
|
| |
|
|
|
|
| |
Actual bug, flagged by Coverity
|
|
|
|
| |
Flagged by Coverity
|
|
|
|
|
| |
Using phrase "timestamp" makes it sound like it has some relation
to wall clock which it does not.
|
| |
|
|
|
|
| |
No reason for these to be inlined
|
|
|
|
|
|
| |
Only used in one place, where const time doesn't matter, but can't hurt.
Remove low_bit, can be replaced by ctz.
|
|
|
|
|
|
| |
Reading the system timestamp first causes every event to get a few
hundred cycles tacked onto it. Only mattered when the thing being
tested was very fast.
|