diff options
author | Jack Lloyd <[email protected]> | 2016-12-28 12:40:21 -0500 |
---|---|---|
committer | Jack Lloyd <[email protected]> | 2016-12-28 12:40:21 -0500 |
commit | bab07b0538627d3373d10c4d32d1f9c6975ce821 (patch) | |
tree | da565d5c2c9f7a108c0dfaa6ea3814a68aa054c2 /doc | |
parent | 6ce3017161f76ae75a9a60430b4f63883c0f3f72 (diff) |
Add a doc on side channel countermeasures and known issues
[ci skip]
Diffstat (limited to 'doc')
-rw-r--r-- | doc/side_channels.rst | 351 |
1 files changed, 351 insertions, 0 deletions
diff --git a/doc/side_channels.rst b/doc/side_channels.rst new file mode 100644 index 000000000..9391accf3 --- /dev/null +++ b/doc/side_channels.rst @@ -0,0 +1,351 @@ +Side Channels +========================= + +Many cryptographic systems can be broken by side channels. This document notes +side channel protections which are currently implemented, as well as areas of +the code which are known to be vulnerable to side channels. The latter are +obviously all open for future improvement. + +The following text assumes the reader is already familiar with cryptographic +implementations, side channel attacks, and common countermeasures. + +RSA +---------------------- + +Blinding is always used to protect private key operations (there is no way to +turn it off). As an optimization, instead of choosing a new random mask and +inverse with each decryption, both the mask and its inverse are simply squared +to choose the next blinding factor. This is much faster than computing a fresh +value each time, and the additional relation is thought to provide only minimal +useful information for an attacker. Every BOTAN_BLINDING_REINIT_INTERVAL +(default 32) operations, a new starting point is chosen. + +RSA signing uses the CRT optimization, which is much faster but vulnerable to +trivial fault attacks [RsaFault] which can result in the key being entirely +compromised. To protect against this (or any other computational error which +would have the same effect as a fault attack in this case), after every private +key operation the result is checked for consistency with the public key. This +introduces only slight additional overhead and blocks most fault attacks; it is +possible to use a second fault attack to bypass this check but + +See blinding.cpp and rsa.cpp for details. + +If the OpenSSL provider is enabled, then no explicit blinding is done; we assume +OpenSSL handles this. See openssl_rsa.cpp for details. + +OAEP +---------------------- + +OAEP decoding is vulnerable to a side channel attack. If an attacker can +determine where in the decoding process an error occurred, they can use this to +decrypt arbitrary RSA ciphertexts. See [OaepTiming] for details. This is avoided +in Botan by making the decoding operation const time. This is verified using +valgrind. See eme_oaep.cpp for details. + +PKCS #1 v1.5 Decryption +------------------------- + +This padding scheme is used with RSA, and is very vulnerable to errors. In an +scenario where an attacker can repeatedly present RSA ciphertexts, and a +legitimate key holder will attempt to decrypt each ciphertext and simply +indicates to the attacker if the PKCS padding was valid or not (without +revealing any additional information), the attacker can use this behavior as an +oracle to perform iterative decryption of arbitrary RSA ciphertexts encrypted +under that key. This is the famous million message attack [MillionMsg]. For +example, simply a difference in timing between a valid and invalid RSA +ciphertext due to an early error exit may be enough to mount an attack. + +Preventing this issue in full requires some application level changes. In +protocols which know the expected length of the encrypted key, PK_Decryptor +provides the function `decrypt_or_random` which first generates a random fake +key, then decrypts the presented ciphertext, then in constant time either copies +out the random key or the decrypted plaintext depending on if the ciphertext was +valid or not. Then, the protocol can carry on with a randomly chosen key, which +will presumably cause total failure in a way that does not allow an attacker to +distinguish (via any timing or other side channel, nor any error messages +specific to the one situation vs the other) if it was the RSA decryption + +One very important user of PKCS #1 v1.5 encryption is the TLS protocol. In TLS, +some extra versioning information is embedded in the plaintext message, along +with the key. It turns out that this version information must be treated in an +identical (constant-time) way with the PKCS padding, or again the system is +broken. [VersionOracle]. This is supported by a special version of +PK_Decryptor::decrypt_or_random that additionally allows verifying one or more +content bytes, in addition to the PKCS padding. + +See eme_pkcs.cpp and pubkey.cpp for details. + +Modular Exponentiation +------------------------ + +Modular exponentiation uses a fixed window algorithm with Montgomery representation. +In the current code, information about the exponent is leaked through the +sequence of memory indexes; we currently rely on randomized blinding at higher +levels of the cryptographic stack to hide this. A future project would be to +change this to use either Montgomery ladder or use a side channel silent table +lookup. See powm_mnt.cpp for details. + +The Karatsuba multiplication algorithm has some conditional branches that +probably expose information through the branch predictor, but probably? does not +expose a timing channel since the same amount of work is done on both sides of +the conditional. There is certainly room for improvement here. See mp_karat.cpp +for details. + +The Montgomery reduction is written (and tested) to run in constant time. See +mp_monty.cpp for details. + +ECC point decoding +---------------------- + +The API function OS2ECP, which is used to convert byte strings to ECC points, +verifies that all points satisfy the ECC curve equation. Points that do not +satisfy the equation are invalid, and can sometimes be used to break +protocols. See point_gfp.cpp for details. + +ECC scalar multiply +---------------------- + +There are two implementations of scalar multiply, PointGFp::operator* and the +class Blinded_Point_Multiply. The default scalar multiply uses the Montgomery +ladder. However it currently leaks the size of the scalar, because the loop +iterations are bounded by the scalar size. + +Blinded_Point_Multiply (used by ECDH, ECDSA, etc) applies several additional +side channel countermeasures. The scalar is masked by a small multiple of the +group order (this is commonly called Coron's first countermeasure [CoronDpa]), +the size of the scalar mask is currently controlled by build.h value +BOTAN_POINTGFP_SCALAR_BLINDING_BITS which defaults to 20 bits. + +Botan stores all ECC points in Jacobian representation. This form allows faster +computation by representing points (x,y) as (X,Y,Z) where x=X/Z^2 and +y=Y/Z^3. As the representation is redundant, for any randomly chosen r, +(X*r^2,Y*r^3,Z*r) is an equivalent point. Changing the point values prevents an +attacker from mounting attacks based on the input point remaining unchanged over +multiple executions. This is commonly called Coron's third countermeasure, see +again [CoronDpa]. + +Currently Blinded_Point_Multiply uses one of two different algorithms, depending +on a build-time flag. If BOTAN_POINTGFP_BLINDED_MULTIPLY_USE_MONTGOMERY_LADDER +is set in build.h (default is for it *not* to be set), then a randomized +Montgomery ladder algorithm from [RandomMonty] is used. Otherwise, a simple +fixed window exponentiation is used; the current version leaks exponent bits +through memory index values. We rely on scalar blinding to reduce this +leakage. It would obviously be better for Blinded_Point_Multiply to converge on +a single side channel silent algorithm. + +See point_gfp.cpp for details. + +ECDH +---------------------- + +ECDH verifies (through its use of OS2ECP) that all input points +received from the other party satisfy the curve equation. This +prevents twist attacks. The same check is performed on the output +point, which helps prevent fault attacks. + +ECDSA +---------------------- + +Inversion of the ECDSA nonce k must be done in constant time, as any +leak of even a single bit of the nonce can be sufficient to allow +recovering the private key. In Botan all inverses modulo an odd number +are performed using a constant time algorithm due to Niels Möller. + +x25519 +---------------------- + +The x25519 code is independent of the main Weiserstrass form ECC code, instead +based on curve25519-donna-c64.c by Adam Langley. The code seems immune to cache +based side channels. It does make use of integer multiplications; on some old +CPUs these multiplications take variable time and might allow a side channel +attack. This is not considered a problem on modern processors. + +TLS CBC ciphersuites +---------------------- + +The original TLS v1.0 CBC Mac-then-Encrypt mode is vulnerable to an oracle +attack. If an attacker can distinguish padding errors through different error +messages [TlsCbcOracle] or via a side channel attack like [Lucky13], they can +abuse the server as a decryption oracle. + +The side channel protection for Lucky13 follows the approach proposed in the +Lucky13 paper. It is not perfectly constant time, but does hide the padding +oracle in practice. Tools to test TLS CBC decoding are included in the timing +tests. See https://github.com/randombit/botan/pull/675 for more information. + +The Encrypt-then-MAC extension, which completely avoids the side channel, is +implemented and used by default for CBC ciphersuites. + +CBC mode padding +---------------------- + +In theory, any good protocol protects CBC ciphertexts with a MAC. But in +practice, some protocols are not good and cannot be fixed immediately. To avoid +making a bad problem worse, the code to handle decoding CBC ciphertext padding +bytes runs in constant time, depending only on the block size of the cipher. + +AES +---------------------- + +The basic implementation uses table lookup, an approach known to be very +vulnerable to side channels. Modifications to the first and last round increase +the cost of some known attacks (by reducing the number of cache lines accessed), +but these truly can only slightly increase the samples required to recover the +AES key. + +For x86 processors with SSSE3 extension (most CPUs since Atom, Core2 Duo, and +AMD Bulldozer), there is an AES implementation which is included that is both +faster than the table lookup code, and immune to cache-based side channel +attacks as it does not perform any memory lookups using secret data. + +A version of AES which is side channel silent on x86 processors with SSSE3 +extension is included. This instruction set is available on many older or low +end x86 processors that do not have AES-NI (including old Atom, Core2 Duo, and +AMD Bobcat). On most processors it is significantly faster than the table lookup +version. It is based on a design by Mike Hamburg [VectorAes]. See aes_ssse3.cpp +for the code. This same technique could be applied with NEON or AltiVec, and the +paper has some optimizations for the AltiVec shuffle. + +On x86 processors which support it, AES-NI instruction set is used, as it is +fast and (presumed) side channel silent. There is no support at the moment for +the similar ARMv8 or POWER AES instructions; patches would be welcome. + +GCM +--------------------- + +On x86 platforms which support the clmul instruction, GCM support is fast and +constant time. + +On all other platforms, GCM is slow and constant time. It uses a simple bit at +at time loop. It would be much faster using a table lookup, + +OCB +----------------------- + +It is straightforward to implement OCB mode in a efficient way that does not +depend on any secret branches or lookups. See ocb.cpp for the implementation. + +Poly1305 +---------------------- + +The Poly1305 implementation does not have any secret lookups or conditionals. +The code is based on the public domain version by Andrew Moon. + +DES/3DES +---------------------- + +The DES implementation uses table lookups, and is likely vulnerable to side +channel attacks. DES or 3DES should be avoided in new systems. The proper fix +would be a scalar bitsliced implementation, this is not seen as worth the +engineering investment given these algorithms end of life status. + +Twofish +------------------------ + +This algorithm uses table lookups with secret sboxes. No cache-based side +channel attack on Twofish has ever been published, but it is possible nobody +sufficiently skilled has ever tried. + +ChaCha20, Serpent, Threefish, ... +----------------------------------- + +Some algorithms including ChaCha, Salsa, Serpent and Threefish are 'naturally' +silent to cache and timing side channels on all recent processors. + +IDEA +--------------- + +IDEA encryption, decryption, and key schedule are implemented to take constant +time regardless of their inputs. + +Hash Functions +------------------------- + +Most hash functions included in Botan such as MD5, SHA-1, SHA-2, SHA-3, Skein, +and BLAKE2 do not require any input-dependent memory lookups, and so seem to not be +affected by common CPU side channels. + +Memory comparisons +---------------------- + +The function same_mem in header mem_ops.h provides a constant-time comparison +function. It is used when comparing MACs or other secret values. It is also +exposed for application use. + +Memory zeroizing +---------------------- + +There is no way in portable C/C++ to zero out an array before freeing it, in +such a way that it is guaranteed that the compiler will not elide the +'additional' (seemingly unnecessary) writes to zero out the memory. + +The function secure_scrub_memory (in mem_ops.cpp) uses some system specific +trick to zero out an array. On Windows it uses the directly supported API +function RtlSecureZeroMemory. + +On other platforms, by default the trick of referencing memset through a +volatile function pointer is used. This approach is not guaranteed to work on +all platforms, and currently there is no systematic check of the resulting +binary function that it is compiled as expected. But, it is the best approach +currently known and has been verified to work as expected on common platforms. + +If BOTAN_USE_VOLATILE_MEMSET_FOR_ZERO is set to 0 in build.h (not the default) a +byte at a time loop through a volatile pointer is used to overwrite the array. + +Botan's secure_vector type is a std::vector with a custom allocator. The +allocator simply calls secure_scrub_memory before freeing memory. + +Memory allocation +---------------------- + +Some systems including Linux and FreeBSD support applications pinning a small +amount (typically 64 KB, sometimes more) of virtual memory pages in RAM, such +that they will not be paged to disk. If available, Botan uses such a region for +storing key material. It is created in anonymous mapped memory (not disk +backed), locked in memory, and zeroized before freeing. + +This memory pool is used by secure_vector when available. It can be disabled by +setting the environment variable BOTAN_MLOCK_POOL_SIZE to 0. + +Automated Analysis +--------------------- + +Currently the main tool used by the Botan developers for testing for side +channels at runtime is valgrind; valgrind's runtime API is used to taint memory +values, and any jumps or indexes using data derived from these values will cause +a valgrind warning. This technique was first used by Adam Langley in ctgrind. +See the header ct_utils.h for details. + +To check, install valgrind, configure the build with --with-valgrind, and run +the tests. + +References +--------------- + +[CoronDpa] Coron, +"Resistance against Differential Power Analysis for Elliptic Curve Cryptosystems" +(http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.5695) + +[Lucky13] AlFardan, Paterson "Lucky Thirteen: Breaking the TLS and DTLS Record Protocols" +(http://www.isg.rhul.ac.uk/tls/TLStiming.pdf) + +[MillionMsg] Bleichenbacher "Chosen Ciphertext Attacks Against Protocols Based +on the RSA Encryption Standard PKCS1" +(http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.8543) + +[OaepTiming] Manger, "A Chosen Ciphertext Attack on RSA Optimal Asymmetric +Encryption Padding (OAEP) as Standardized in PKCS #1 v2.0" +(http://archiv.infsec.ethz.ch/education/fs08/secsem/Manger01.pdf) + +[RsaFault] Boneh, Demillo, Lipton +"On the importance of checking cryptographic protocols for faults" +(http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.9764) + +[RandomMonty] Le, Tan, Tunstall "Randomizing the Montgomery Powering Ladder" +(https://eprint.iacr.org/2015/657) + +[VectorAes] Hamburg, "Accelerating AES with Vector Permute Instructions" +https://shiftleft.org/papers/vector_aes/vector_aes.pdf + +[VersionOracle] Klíma, Pokorný, Rosa "Attacking RSA-based Sessions in SSL/TLS" +(https://eprint.iacr.org/2003/052) |