aboutsummaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* Correct spellingJack Lloyd2018-12-291-0/+1
|
* Add OS::read_env_variableJack Lloyd2018-12-293-9/+22
| | | | Combines the priv check and the getenv call on one.
* Merge GH #1798 Use posix_memalign instead of mmap for page locked poolJack Lloyd2018-12-291-17/+9
|\
| * Use posix_memalign instead of mmap for creating the locking poolJack Lloyd2018-12-281-17/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | As described in #602, using mmap with fork causes problems because the mmap remains shared in the child instead of being copy-on-write, then the parent and child stomp on each others memory. However we really do not need mmap semantics, we just want a block of memory that is page-aligned, which can be done with posix_memalign instead. This was added in POSIX.1-2001 and seems to be implemented by all modern systems. Closes #602
* | Avoid const-time modulo in DSA verificationJack Lloyd2018-12-291-1/+11
| | | | | | | | | | | | It has a substantial perf hit and is not necessary. It may not be really necessary for signatures either but leave that as it, with a comment explaining.
* | Simplifications in BigIntJack Lloyd2018-12-291-7/+1
|/ | | | | Use ct_is_zero instead of more complicated construction, and avoid duplicated size check/resize - Data::set_word will handle it.
* Make bigint_sub_abs const timeJack Lloyd2018-12-272-6/+26
|
* Add a test of highly imbalanced RSA keyJack Lloyd2018-12-271-0/+15
|
* Fix Barrett reduction input boundJack Lloyd2018-12-263-13/+23
| | | | | | | | | | | | In the long ago when I wrote the Barrett code I must have missed that Barrett works for any input < 2^2k where k is the word size of the modulus. Fixing this has several nice effects, it is faster because it replaces a multiprecision comparison with a single size_t compare, and now the branch does not reveal information about the input or modulus, but only their word lengths, which is not considered sensitive. Fixing this allows reverting the change make in a57ce5a4fd2 and now RSA signing is even slightly faster than in 2.8, rather than 30% slower.
* Avoid size-based bypass of the comparison in Barrett reduction.Jack Lloyd2018-12-241-1/+1
| | | | As it would leak if an input was > p^2, or just close to it in size.
* Avoid conditional branch in Barrett for negative inputsJack Lloyd2018-12-241-4/+27
|
* Always use const-time modulo during DSA signingJack Lloyd2018-12-241-1/+2
| | | | | | | | | | | Since we are reducing a mod-p integer down to mod-q this would nearly always use ct_modulo in any case. And, in the case where Barrett did work, it would reveal that g^k mod p was <= q*q which would likely be useful for searching for k. This should actually be slightly faster (if anything) since it avoids the unnecessary comparison against q*q and jumps directly to ct_modulo.
* Address a side channel in RSA and SM2Jack Lloyd2018-12-242-8/+4
| | | | | | | | | | | | | | | | | Barrett will branch to a different (and slower) algorithm if the input is larger than the square of the modulus. This branch can be detected by a side channel. For RSA we need to compute m % p and m % q to get CRT started. Being able to detect if m > q*q (assuming q is the smaller prime) allows a binary search on the secret prime. This attack is blocked by input blinding, but still seems dangerous. Unfortunately changing to use the generic const time modulo instead of Barrett introduces a rather severe performance regression in RSA signing. In SM2, reduce k-r*x modulo the order before multiplying it with (x-1)^-1. Otherwise the need for slow modulo vs Barrett leaks information about k and/or x.
* In NIST P-xxx reductions unpoison S before using itJack Lloyd2018-12-241-8/+10
| | | | | | | | Was already done in P-256 but not in P-{192,224,384}. This is a cache-based side channel which would be good to address. It seems like it would be very difficult to exploit even with perfect recovery, but crazier things have worked.
* Unpoison result of high_bits_freeJack Lloyd2018-12-241-0/+1
| | | | | Previously we unpoisoned the input to high_bit but this is no longer required. But still the output should be unpoisoned.
* Correct read in test fuzzersJack Lloyd2018-12-231-1/+1
|
* Add a multi-file input mode for test fuzzersJack Lloyd2018-12-233-24/+105
| | | | | | | | | | The test_fuzzers.py script is very slow especially on CI. Add a mode to the test fuzzers where it will accept many files on the command line and test each of them in turn. This is 100s of times faster, as it avoids all overhead from fork/exec. It has the downside that you can't tell which input caused a crash, so retain the old mode with --one-at-a-time option for debugging work.
* Move coverage before fuzzers in Travis buildJack Lloyd2018-12-231-1/+1
| | | | | Coverage is the slowest build, moving it up puts it into the initial tranche of builds so it finishes before the end of the build.
* In Travis, run OS X firstJack Lloyd2018-12-231-1/+1
| | | | | | It is slower to startup and the overall build ends up waiting for these last 2 builds. By running them in the front of the line they can overlap with other builds.
* By default just run 20 of the AEAD test vectors through CLIJack Lloyd2018-12-231-6/+11
| | | | | Running them all takes a long time, especially in CI, and doesn't really add much.
* Increase Travis ccache sizeJack Lloyd2018-12-231-1/+1
| | | | The cache size increases will continue until hit rate improves.
* Increase Travis git pull depthJack Lloyd2018-12-231-1/+1
| | | | | | | Undocumented? side effect of a small git pull depth - if more than N new commits are pushed to master while an earlier build is running, the old build starts failing, as when CI does the pull it does not find the commit it is building within the checked out tree.
* Another try at silencing Coverity on thisJack Lloyd2018-12-231-1/+1
|
* Initialize System_Error::m_error_codeJack Lloyd2018-12-231-1/+2
| | | | Actual bug, flagged by Coverity
* Avoid double return of unique_ptrJack Lloyd2018-12-231-1/+3
| | | | Flagged by Coverity
* Add --no-store-vc-rev option for use in CI buildsJack Lloyd2018-12-231-0/+2
| | | | | | | This skips putting the git revision in the build.h header. This value changing every time means we effectively disable ccache's direct mode (which is faster than preprocessor mode) and also prevent any caching of the amalgamation file (since version.cpp expands the macro).
* Increase Travis ccache to 750MJack Lloyd2018-12-231-1/+1
| | | | Even 600M is not sufficient for the coverage build
* Rename OS::get_processor_timestamp to OS::get_cpu_cycle_counterJack Lloyd2018-12-235-14/+15
| | | | | Using phrase "timestamp" makes it sound like it has some relation to wall clock which it does not.
* Now Timer does not need to include an internal headerJack Lloyd2018-12-231-1/+0
|
* De-inline more of TimerJack Lloyd2018-12-232-41/+37
| | | | No reason for these to be inlined
* Make significant_words const time alsoJack Lloyd2018-12-234-40/+75
| | | | | | Only used in one place, where const time doesn't matter, but can't hurt. Remove low_bit, can be replaced by ctz.
* In Timer, grab CPU clock firstJack Lloyd2018-12-231-9/+9
| | | | | | Reading the system timestamp first causes every event to get a few hundred cycles tacked onto it. Only mattered when the thing being tested was very fast.
* Increase Travis ccache againJack Lloyd2018-12-231-1/+1
| | | | Still insufficient for debug builds
* Remove now incorrect commentJack Lloyd2018-12-221-5/+0
|
* Make high_bit and ctz actually const timeJack Lloyd2018-12-221-3/+3
|
* Promote ct_is_zero and expand_top_bit to bit_ops.hJack Lloyd2018-12-222-10/+21
|
* Make ctz and high_bit faster and const-time-ishJack Lloyd2018-12-223-48/+51
| | | | | | | They get compiled as const-time on x86-64 with GCC but I don't think this can be totally relied on. But it is anyway an improvement. And, faster, because we compute it recursively
* Increase Travis cache size [ci skip]Jack Lloyd2018-12-221-2/+2
| | | | | With compression disabled, the cache is too small for builds that use debug info, and causes 100% miss rate.
* Fix build with PGI [ci skip]Jack Lloyd2018-12-221-5/+7
| | | | I couldn't get anything to link with PGI, but at least it builds again.
* Merge GH #1794 Improve const time logic in PKCS1v15 and OAEP decodingJack Lloyd2018-12-219-92/+171
|\
| * Use consistent logic for OAEP and PKCS1v15 decodingJack Lloyd2018-12-219-92/+171
| | | | | | | | | | | | | | | | | | | | | | | | The decoding leaked some information about the delimiter index due to copying only exactly input_len - delim_idx bytes. I can't articulate a specific attack that would work here, but it is easy enough to fix this to run in const time instead, where all bytes are accessed regardless of the length of the padding. CT::copy_out is O(n^2) and thus terrible, but in practice it is only used with RSA decryption, and multiplication is also O(n^2) with the modulus size, so a few extra cycles here doesn't matter much.
* | Avoid including rotate.h in bswap.hJack Lloyd2018-12-2128-2/+30
| | | | | | | | | | | | | | It was only needed for one case which is easily hardcoded. Include rotate.h in all the source files that actually use rotr/rotl but implicitly picked it up via loadstor.h -> bswap.h -> rotate.h include chain.
* | Stop compressing Travis ccacheJack Lloyd2018-12-211-3/+1
|/ | | | Since CPU is main bottleneck to the build, this is likely not helping.
* Address a couple of Coverity false positivesJack Lloyd2018-12-194-7/+62
| | | | Add tests for is_power_of_2
* Avoid using unblinded Montgomery ladder during ECC key generationJack Lloyd2018-12-182-11/+32
| | | | | | | | | | | As doing so means that information about the high bits of the scalar can leak via timing since the loop bound depends on the length of the scalar. An attacker who has such information can perform a more efficient brute force attack (using Pollard's rho) than would be possible otherwise. Found by Ján Jančár (@J08nY) using ECTester (https://github.com/crocs-muni/ECTester) CVE-2018-20187
* Test how long it takes to precompute base point multiplesJack Lloyd2018-12-162-1/+21
|
* In PointGFp addition, prevent all_zeros from being shortcircuitedJack Lloyd2018-12-141-4/+7
| | | | | | This doesn't matter much but it causes confusing valgrind output when const-time checking since it distinguishes between the two possible conditional returns.
* Unroll const_time_lookup by 2Jack Lloyd2018-12-141-6/+10
| | | | | We know the lookup table is some power of 2, unrolling a bit allows more IPC
* Simplify the const time lookup in ECC scalar mulJack Lloyd2018-12-141-12/+9
| | | | | Code is easier to understand and it may let the CPU interleave the loads and logical ops better. Slightly faster on my machine.
* Use a 3-bit comb for ECC base point multiplyJack Lloyd2018-12-132-19/+36
| | | | Improves ECDSA signing by 15%