6 files changed, 422 insertions, 20 deletions
diff --git a/doc/abi.rst b/doc/abi.rst
index 319c1fc81..0fa26d21b 100644
--- a/doc/abi.rst
+++ b/doc/abi.rst
@@ -2,18 +2,15 @@
 API/ABI Stability
 ====================
 
-The API and ABI in development branches (those with an odd number, such as 1.11
-or 2.1) is subject to change without notice. We don't go out of our way to break
-client code, but if a possibility for serious improvement is seen it is taken.
+API of release branches is stable; that is to say code that compiles and works
+against 2.0.0 should also compile with all later 2.x.x versions. The API on
+master is completely fair game and may change at any time.
 
-For stable branches (1.10, 2.0) API stability is considered important.
-Coorespondingly they see mostly security and bug fixes, or improvements
-backported from the devel tree when doing so would not cause problems for
-existing code. Code written and working for .0 of a stable release should
-continue to work with later releases from the same branch.
+Maintaining a consistent ABI while evolving a complex C++ API is exceedingly
+expensive in development time. It is likely ABI breakage will occur at least
+occasionally even in release branches. In these cases, the soname revision will
+increase to prevent applications from linking against a potentially incompatible
+version at runtime.
 
-However maintaining a consistent ABI while changing a complex C++ API is
-exceedingly expensive in development time, and even in the stable branches ABI
-breakage may be necessary to fix a security issue or other major bug. If ABI
-breakage knowingly occurs in the stable tree it will be documented accordingly
-in the release notes and the soname revision will be ticked up.
+You can review a report on ABI changes to Botan at
+https://abi-laboratory.pro/tracker/timeline/botan/
diff --git a/doc/manual/compression.rst b/doc/manual/compression.rst
index 808b7f0dc..2d1f4c142 100644
--- a/doc/manual/compression.rst
+++ b/doc/manual/compression.rst
@@ -3,7 +3,9 @@ Lossless Data Compression
 
 Some lossless data compression algorithms are available in botan, currently all
 via third party libraries - these include zlib (including deflate and gzip
-formats), bzip2, and lzma.
+formats), bzip2, and lzma. Support for these must be enabled at build time;
+you can check for them using the macros ``BOTAN_HAS_ZLIB``, ``BOTAN_HAS_BZIP2``,
+and ``BOTAN_HAS_LZMA``.
 
 .. note::
    You should always compress *before* you encrypt, because encryption seeks to
diff --git a/doc/manual/contents.rst b/doc/manual/contents.rst
index 79e410051..67665667b 100644
--- a/doc/manual/contents.rst
+++ b/doc/manual/contents.rst
@@ -8,10 +8,8 @@ Contents
    goals
    building
    platforms
-   python
-   cli
-   versions
    firststep
+   versions
    secmem
    rng
    filters
@@ -31,6 +29,10 @@ Contents
    srp
    fpe
    compression
-   ffi
    pkcs11
    tpm
+   ffi
+   python
+   cli
+   side_channels
+
diff --git a/doc/manual/passhash.rst b/doc/manual/passhash.rst
index 417e58265..b3db1f3e7 100644
--- a/doc/manual/passhash.rst
+++ b/doc/manual/passhash.rst
@@ -75,8 +75,13 @@ Bcrypt Password Hashing
 
 :wikipedia:`Bcrypt` is a password hashing scheme originally designed
 for use in OpenBSD, but numerous other implementations exist.
-It is made available by including ``bcrypt.h``. Bcrypt provides
-outputs that look like this::
+It is made available by including ``bcrypt.h``.
+
+It has the advantage that it requires a small amount (4K) of fast RAM
+to compute, which can make hardware password cracking somewhat more
+expensive.
+
+Bcrypt provides outputs that look like this::
 
   "$2a$12$7KIYdyv8Bp32WAvc.7YvI.wvRlyVn0HP/EhPmmOyMQA4YKxINO0p2"
 
@@ -104,6 +109,10 @@ Botan also provides a password hashing technique called passhash9, in
 
   "$9$AAAKxwMGNPSdPkOKJS07Xutm3+1Cr3ytmbnkjO6LjHzCMcMQXvcT"
 
+This function should be secure with the proper parameters, and will remain in
+the library for the forseeable future, but it is specific to Botan rather than
+being a widely used password hash. Prefer bcrypt.
+
 .. cpp:function:: std::string generate_passhash9(const std::string& password, \
    RandomNumberGenerator& rng, u16bit work_factor = 10, byte alg_id = 1)
 
diff --git a/doc/manual/side_channels.rst b/doc/manual/side_channels.rst
new file mode 100644
index 000000000..eeabeae7e
--- /dev/null
+++ b/doc/manual/side_channels.rst
@@ -0,0 +1,390 @@
+Side Channels
+=========================
+
+Many cryptographic systems can be broken by side channels. This document notes
+side channel protections which are currently implemented, as well as areas of
+the code which are known to be vulnerable to side channels. The latter are
+obviously all open for future improvement.
+
+The following text assumes the reader is already familiar with cryptographic
+implementations, side channel attacks, and common countermeasures.
+
+RSA
+----------------------
+
+Blinding is always used to protect private key operations (there is no way to
+turn it off). As an optimization, instead of choosing a new random mask and
+inverse with each decryption, both the mask and its inverse are simply squared
+to choose the next blinding factor. This is much faster than computing a fresh
+value each time, and the additional relation is thought to provide only minimal
+useful information for an attacker. Every BOTAN_BLINDING_REINIT_INTERVAL
+(default 32) operations, a new starting point is chosen.
+
+RSA signing uses the CRT optimization, which is much faster but vulnerable to
+trivial fault attacks [RsaFault] which can result in the key being entirely
+compromised. To protect against this (or any other computational error which
+would have the same effect as a fault attack in this case), after every private
+key operation the result is checked for consistency with the public key. This
+introduces only slight additional overhead and blocks most fault attacks; it is
+possible to use a second fault attack to bypass this verification, but such a
+double fault attack requires significantly more control on the part of an
+attacker than a BellCore style attack, which is possible if any error at all
+occurs during either modular exponentiation involved in the RSA signature
+operation.
+
+See blinding.cpp and rsa.cpp.
+
+If the OpenSSL provider is enabled, then no explicit blinding is done; we assume
+OpenSSL handles this. See openssl_rsa.cpp.
+
+Decryption of PKCS #1 v1.5 Ciphertexts
+----------------------------------------
+
+This padding scheme is used with RSA, and is very vulnerable to errors. In a
+scenario where an attacker can repeatedly present RSA ciphertexts, and a
+legitimate key holder will attempt to decrypt each ciphertext and simply
+indicates to the attacker if the PKCS padding was valid or not (without
+revealing any additional information), the attacker can use this behavior as an
+oracle to perform iterative decryption of arbitrary RSA ciphertexts encrypted
+under that key. This is the famous million message attack [MillionMsg]
+[MillionMsgTiming]. For example, simply a difference in timing between a valid
+and invalid RSA ciphertext due to an early error exit may be enough to mount an
+attack.
+
+Preventing this issue in full requires some application level changes. In
+protocols which know the expected length of the encrypted key, PK_Decryptor
+provides the function `decrypt_or_random` which first generates a random fake
+key, then decrypts the presented ciphertext, then in constant time either copies
+out the random key or the decrypted plaintext depending on if the ciphertext was
+valid or not. Then, the protocol can carry on with a randomly chosen key, which
+will presumably cause total failure in a way that does not allow an attacker to
+distinguish (via any timing or other side channel, nor any error messages
+specific to the one situation vs the other) if it was the RSA decryption
+
+One very important user of PKCS #1 v1.5 encryption is the TLS protocol. In TLS,
+some extra versioning information is embedded in the plaintext message, along
+with the key. It turns out that this version information must be treated in an
+identical (constant-time) way with the PKCS padding, or again the system is
+broken. [VersionOracle]. This is supported by a special version of
+PK_Decryptor::decrypt_or_random that additionally allows verifying one or more
+content bytes, in addition to the PKCS padding.
+
+See eme_pkcs.cpp and pubkey.cpp.
+
+Verification of PKCS #1 v1.5 Signatures
+----------------------------------------
+
+One way of verifying PKCS #1 v1.5 padding is to decode it with an ASN.1 BER
+parser.  However such a design commonly leads to accepting, and needlessly
+exposes the BER parser to untrusted inputs. It is possible (and simpler) to
+instead re-encode the hash value we are expecting, and const time compare it
+with the output of the RSA operation.
+
+See emsa_pkcs.cpp.
+
+OAEP
+----------------------
+
+RSA OAEP is (PKCS#1 v2) is the recommended version of RSA encoding standard,
+because it is not directly vulnerable to Bleichenbacher attack. However, if
+implemented incorrectly, a side channel can be presented to an attacker and
+create an oracle for decrypting RSA ciphertexts [OaepTiming].
+
+This attack is avoided in Botan by making the OAEP decoding operation run
+without any conditional jumps or indexes, with the only variance in runtime
+coming from the length of the RSA key (which is public information).
+
+See eme_oaep.cpp.
+
+Modular Exponentiation
+------------------------
+
+Modular exponentiation uses a fixed window algorithm with Montgomery representation.
+In the current code, information about the exponent is leaked through the
+sequence of memory indexes; we currently rely on randomized blinding at higher
+levels of the cryptographic stack to hide this. A future project would be to
+change this to use either Montgomery ladder or use a side channel silent table
+lookup. See powm_mnt.cpp.
+
+The Karatsuba multiplication algorithm has some conditional branches that
+probably expose information through the branch predictor, but probably? does not
+expose a timing channel since the same amount of work is done on both sides of
+the conditional. There is certainly room for improvement here. See mp_karat.cpp
+for details.
+
+The Montgomery reduction is written (and tested) to run in constant time. See
+mp_monty.cpp.
+
+ECC point decoding
+----------------------
+
+The API function OS2ECP, which is used to convert byte strings to ECC points,
+verifies that all points satisfy the ECC curve equation. Points that do not
+satisfy the equation are invalid, and can sometimes be used to break
+protocols ([InvalidCurve] [InvalidCurveTLS]). See point_gfp.cpp.
+
+ECC scalar multiply
+----------------------
+
+There are two implementations of scalar multiply, PointGFp::operator* and the
+class Blinded_Point_Multiply. The default scalar multiply uses the Montgomery
+ladder. However it currently leaks the size of the scalar, because the loop
+iterations are bounded by the scalar size.
+
+Blinded_Point_Multiply (used by ECDH, ECDSA, etc) applies several additional
+side channel countermeasures. The scalar is masked by a small multiple of the
+group order (this is commonly called Coron's first countermeasure [CoronDpa]),
+the size of the scalar mask is currently controlled by build.h value
+BOTAN_POINTGFP_SCALAR_BLINDING_BITS which defaults to 20 bits.
+
+Botan stores all ECC points in Jacobian representation. This form allows faster
+computation by representing points (x,y) as (X,Y,Z) where x=X/Z^2 and
+y=Y/Z^3. As the representation is redundant, for any randomly chosen r,
+(X*r^2,Y*r^3,Z*r) is an equivalent point. Changing the point values prevents an
+attacker from mounting attacks based on the input point remaining unchanged over
+multiple executions. This is commonly called Coron's third countermeasure, see
+again [CoronDpa].
+
+Currently Blinded_Point_Multiply uses one of two different algorithms, depending
+on a build-time flag. If BOTAN_POINTGFP_BLINDED_MULTIPLY_USE_MONTGOMERY_LADDER
+is set in build.h (default is for it *not* to be set), then a randomized
+Montgomery ladder algorithm from [RandomMonty] is used. Otherwise, a simple
+fixed window exponentiation is used; the current version leaks exponent bits
+through memory index values. We rely on scalar blinding to reduce this
+leakage. It would obviously be better for Blinded_Point_Multiply to converge on
+a single side channel silent algorithm.
+
+See point_gfp.cpp.
+
+ECDH
+----------------------
+
+ECDH verifies (through its use of OS2ECP) that all input points
+received from the other party satisfy the curve equation. This
+prevents twist attacks. The same check is performed on the output
+point, which helps prevent fault attacks.
+
+ECDSA
+----------------------
+
+Inversion of the ECDSA nonce k must be done in constant time, as any
+leak of even a single bit of the nonce can be sufficient to allow
+recovering the private key. In Botan all inverses modulo an odd number
+are performed using a constant time algorithm due to Niels Möller.
+
+x25519
+----------------------
+
+The x25519 code is independent of the main Weiserstrass form ECC code, instead
+based on curve25519-donna-c64.c by Adam Langley. The code seems immune to cache
+based side channels. It does make use of integer multiplications; on some old
+CPUs these multiplications take variable time and might allow a side channel
+attack. This is not considered a problem on modern processors.
+
+TLS CBC ciphersuites
+----------------------
+
+The original TLS v1.0 CBC Mac-then-Encrypt mode is vulnerable to an oracle
+attack.  If an attacker can distinguish padding errors through different error
+messages [TlsCbcOracle] or via a side channel attack like [Lucky13], they can
+abuse the server as a decryption oracle.
+
+The side channel protection for Lucky13 follows the approach proposed in the
+Lucky13 paper. It is not perfectly constant time, but does hide the padding
+oracle in practice. Tools to test TLS CBC decoding are included in the timing
+tests. See https://github.com/randombit/botan/pull/675 for more information.
+
+The Encrypt-then-MAC extension, which completely avoids the side channel, is
+implemented and used by default for CBC ciphersuites.
+
+CBC mode padding
+----------------------
+
+In theory, any good protocol protects CBC ciphertexts with a MAC. But in
+practice, some protocols are not good and cannot be fixed immediately. To avoid
+making a bad problem worse, the code to handle decoding CBC ciphertext padding
+bytes runs in constant time, depending only on the block size of the cipher.
+
+AES
+----------------------
+
+The basic implementation uses table lookup, an approach known to be very
+vulnerable to side channels. Modifications to the first and last round increase
+the cost of some known attacks (by reducing the number of cache lines accessed),
+but these truly can only slightly increase the samples required to recover the
+AES key.
+
+For x86 processors with SSSE3 extension (most CPUs since Atom, Core2 Duo, and
+AMD Bulldozer), there is an AES implementation which is included that is both
+faster than the table lookup code, and immune to cache-based side channel
+attacks as it does not perform any memory lookups using secret data.
+
+A version of AES which is side channel silent on x86 processors with SSSE3
+extension is included.  This instruction set is available on many older or low
+end x86 processors that do not have AES-NI (including old Atom, Core2 Duo, and
+AMD Bobcat). On most processors it is significantly faster than the table lookup
+version. It is based on a design by Mike Hamburg [VectorAes]. See aes_ssse3.cpp
+for the code. This same technique could be applied with NEON or AltiVec, and the
+paper has some optimizations for the AltiVec shuffle.
+
+On x86 processors which support it, AES-NI instruction set is used, as it is
+fast and (presumed) side channel silent. There is no support at the moment for
+the similar ARMv8 or POWER AES instructions; patches would be welcome.
+
+GCM
+---------------------
+
+On x86 platforms which support the clmul instruction, GCM support is fast and
+constant time.
+
+On all other platforms, GCM is slow and constant time. It uses a simple bit at
+at time loop. It would be much faster using a table lookup,
+
+OCB
+-----------------------
+
+It is straightforward to implement OCB mode in a efficient way that does not
+depend on any secret branches or lookups. See ocb.cpp for the implementation.
+
+Poly1305
+----------------------
+
+The Poly1305 implementation does not have any secret lookups or conditionals.
+The code is based on the public domain version by Andrew Moon.
+
+DES/3DES
+----------------------
+
+The DES implementation uses table lookups, and is likely vulnerable to side
+channel attacks. DES or 3DES should be avoided in new systems. The proper fix
+would be a scalar bitsliced implementation, this is not seen as worth the
+engineering investment given these algorithms end of life status.
+
+Twofish
+------------------------
+
+This algorithm uses table lookups with secret sboxes. No cache-based side
+channel attack on Twofish has ever been published, but it is possible nobody
+sufficiently skilled has ever tried.
+
+ChaCha20, Serpent, Threefish, ...
+-----------------------------------
+
+Some algorithms including ChaCha, Salsa, Serpent and Threefish are 'naturally'
+silent to cache and timing side channels on all recent processors.
+
+IDEA
+---------------
+
+IDEA encryption, decryption, and key schedule are implemented to take constant
+time regardless of their inputs.
+
+Hash Functions
+-------------------------
+
+Most hash functions included in Botan such as MD5, SHA-1, SHA-2, SHA-3, Skein,
+and BLAKE2 do not require any input-dependent memory lookups, and so seem to not be
+affected by common CPU side channels.
+
+Memory comparisons
+----------------------
+
+The function same_mem in header mem_ops.h provides a constant-time comparison
+function. It is used when comparing MACs or other secret values. It is also
+exposed for application use.
+
+Memory zeroizing
+----------------------
+
+There is no way in portable C/C++ to zero out an array before freeing it, in
+such a way that it is guaranteed that the compiler will not elide the
+'additional' (seemingly unnecessary) writes to zero out the memory.
+
+The function secure_scrub_memory (in mem_ops.cpp) uses some system specific
+trick to zero out an array. On Windows it uses the directly supported API
+function RtlSecureZeroMemory.
+
+On other platforms, by default the trick of referencing memset through a
+volatile function pointer is used. This approach is not guaranteed to work on
+all platforms, and currently there is no systematic check of the resulting
+binary function that it is compiled as expected. But, it is the best approach
+currently known and has been verified to work as expected on common platforms.
+
+If BOTAN_USE_VOLATILE_MEMSET_FOR_ZERO is set to 0 in build.h (not the default) a
+byte at a time loop through a volatile pointer is used to overwrite the array.
+
+Memory allocation
+----------------------
+
+Botan's secure_vector type is a std::vector with a custom allocator. The
+allocator calls secure_scrub_memory before freeing memory.
+
+Some operating systems support an API call to lock a range of pages
+into memory, such that they will never be swapped out (mlock on POSIX,
+VirtualLock on Windows). On many POSIX systems mlock is only usable by
+root, but on Linux, FreeBSD and possibly other systems a small amount
+of memory can be mlock'ed by processes without extra credentials.
+
+If available, Botan uses such a region for storing key material. It is
+created in anonymous mapped memory (not disk backed), locked in
+memory, and scrubbed on free. This memory pool is used by
+secure_vector when available. It can be disabled at runtime setting
+the environment variable BOTAN_MLOCK_POOL_SIZE to 0.
+
+Automated Analysis
+---------------------
+
+Currently the main tool used by the Botan developers for testing for side
+channels at runtime is valgrind; valgrind's runtime API is used to taint memory
+values, and any jumps or indexes using data derived from these values will cause
+a valgrind warning. This technique was first used by Adam Langley in ctgrind.
+See header ct_utils.h.
+
+To check, install valgrind, configure the build with --with-valgrind, and run
+the tests.
+
+References
+---------------
+
+[CoronDpa] Coron,
+"Resistance against Differential Power Analysis for Elliptic Curve Cryptosystems"
+(http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.5695)
+
+[InvalidCurve] Biehl, Meyer, Müller: Differential fault attacks on
+elliptic curve cryptosystems
+(http://www.iacr.org/archive/crypto2000/18800131/18800131.pdf)
+
+[InvalidCurveTLS] Jager, Schwenk, Somorovsky: Practical Invalid Curve
+Attacks on TLS-ECDH
+(https://www.nds.rub.de/research/publications/ESORICS15/)
+
+[SafeCurves] Bernstein, Lange: SafeCurves: choosing safe curves for
+elliptic-curve cryptography. (http://safecurves.cr.yp.to)
+
+[Lucky13] AlFardan, Paterson "Lucky Thirteen: Breaking the TLS and DTLS Record Protocols"
+(http://www.isg.rhul.ac.uk/tls/TLStiming.pdf)
+
+[MillionMsg] Bleichenbacher "Chosen Ciphertext Attacks Against Protocols Based
+on the RSA Encryption Standard PKCS1"
+(http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.8543)
+
+[MillionMsgTiming] Meyer, Somorovsky, Weiss, Schwenk, Schinzel, Tews: Revisiting
+SSL/TLS Implementations: New Bleichenbacher Side Channels and Attacks
+(https://www.nds.rub.de/research/publications/mswsst2014-bleichenbacher-usenix14/)
+
+[OaepTiming] Manger, "A Chosen Ciphertext Attack on RSA Optimal Asymmetric
+Encryption Padding (OAEP) as Standardized in PKCS #1 v2.0"
+(http://archiv.infsec.ethz.ch/education/fs08/secsem/Manger01.pdf)
+
+[RsaFault] Boneh, Demillo, Lipton
+"On the importance of checking cryptographic protocols for faults"
+(http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.9764)
+
+[RandomMonty] Le, Tan, Tunstall "Randomizing the Montgomery Powering Ladder"
+(https://eprint.iacr.org/2015/657)
+
+[VectorAes] Hamburg, "Accelerating AES with Vector Permute Instructions"
+https://shiftleft.org/papers/vector_aes/vector_aes.pdf
+
+[VersionOracle] Klíma, Pokorný, Rosa "Attacking RSA-based Sessions in SSL/TLS"
+(https://eprint.iacr.org/2003/052)
diff --git a/doc/todo.rst b/doc/todo.rst
index d85823e63..13eeb3ddd 100644
--- a/doc/todo.rst
+++ b/doc/todo.rst
@@ -54,6 +54,7 @@ External Providers, Hardware Support
 * ARMv8-A crypto extensions (AES, SHA-2)
 * POWER8 crypto extensions (AES, SHA-2)
 * Better TPM support: NVRAM, PCR measurements, sealing
+* Intel SGX support
 
 TLS
 ----------------------------------------
@@ -74,6 +75,7 @@ TLS
 PKIX
 ----------------------------------------
 
+* Further tests of validation API (see GH #785)
 * Test suite for validation of 'real world' cert chains (GH #611)
 * Improve output of X509_Certificate::to_string
   This is a free-form string for human consumption so the only constraints