diff options
author | Jack Lloyd <[email protected]> | 2018-03-20 18:24:47 -0400 |
---|---|---|
committer | Jack Lloyd <[email protected]> | 2018-03-20 18:24:47 -0400 |
commit | eb5bc1b1ea573c44efe58b18ed07a82db88029fb (patch) | |
tree | b640aa4c2859b35e6fdf3f4bb2ae5d13691c99fb /doc/manual/side_channels.rst | |
parent | c53f4fa8686a59f2360e4609f4fc9cdc66ce00d1 (diff) |
Update side channel doc
Diffstat (limited to 'doc/manual/side_channels.rst')
-rw-r--r-- | doc/manual/side_channels.rst | 63 |
1 files changed, 34 insertions, 29 deletions
diff --git a/doc/manual/side_channels.rst b/doc/manual/side_channels.rst index 01d20a66d..6d6bd74bb 100644 --- a/doc/manual/side_channels.rst +++ b/doc/manual/side_channels.rst @@ -128,16 +128,20 @@ protocols ([InvalidCurve] [InvalidCurveTLS]). See point_gfp.cpp. ECC scalar multiply ---------------------- -There are two implementations of scalar multiply, PointGFp::operator* and the -class Blinded_Point_Multiply. The default scalar multiply uses the Montgomery -ladder. However it currently leaks the size of the scalar, because the loop -iterations are bounded by the scalar size. +There are several different implementations of ECC scalar multiplications which +depend on the API invoked. This include ``PointGFp::operator*``, +``EC_Group::blinded_base_point_multiply`` and +``EC_Group::blinded_var_point_multiply``. -Blinded_Point_Multiply (used by ECDH, ECDSA, etc) applies several additional -side channel countermeasures. The scalar is masked by a small multiple of the -group order (this is commonly called Coron's first countermeasure [CoronDpa]), -the size of the scalar mask is currently controlled by build.h value -BOTAN_POINTGFP_SCALAR_BLINDING_BITS which defaults to 20 bits. +The ``PointGFp::operator*`` implementation uses the Montgomery ladder, which is +fairly resistant to side channels. However it leaks the size of the scalar, +because the loop iterations are bounded by the scalar size. It should not be +used in cases when the scalar is a secret. + +Both ``blinded_base_point_multiply`` and ``blinded_var_point_multiply`` apply +side channel countermeasures. The scalar is masked by a multiple of the group +order (this is commonly called Coron's first countermeasure [CoronDpa]), +currently the multiplier is an 80 bit integer. Botan stores all ECC points in Jacobian representation. This form allows faster computation by representing points (x,y) as (X,Y,Z) where x=X/Z^2 and @@ -147,16 +151,17 @@ attacker from mounting attacks based on the input point remaining unchanged over multiple executions. This is commonly called Coron's third countermeasure, see again [CoronDpa]. -Currently Blinded_Point_Multiply uses one of two different algorithms, depending -on a build-time flag. If BOTAN_POINTGFP_BLINDED_MULTIPLY_USE_MONTGOMERY_LADDER -is set in build.h (default is for it *not* to be set), then a randomized -Montgomery ladder algorithm from [RandomMonty] is used. Otherwise, a simple -fixed window exponentiation is used; the current version leaks exponent bits -through memory index values. We rely on scalar blinding to reduce this -leakage. It would obviously be better for Blinded_Point_Multiply to converge on -a single side channel silent algorithm. +The base point multiplication algorithm is a comb-like technique which +precomputes ``P^i,(2*P)^i,(3*P)^i`` for all ``i`` in the range of valid scalars. +This means the scalar multiplication involves only point additions and no +doublings, which may help against attacks which rely on distinguishing between +point doublings and point additions. + +The variable point multiplication algorithm uses a simple fixed-window +exponentiation algorithm. Since this is normally invoked using untrusted points +(eg in ECDH key exchange) it randomizes all inputs. -See point_gfp.cpp. +See point_gfp.cpp and point_mul.cpp ECDH ---------------------- @@ -177,7 +182,7 @@ are performed using a constant time algorithm due to Niels Möller. x25519 ---------------------- -The x25519 code is independent of the main Weiserstrass form ECC code, instead +The x25519 code is independent of the main Weierstrass form ECC code, instead based on curve25519-donna-c64.c by Adam Langley. The code seems immune to cache based side channels. It does make use of integer multiplications; on some old CPUs these multiplications take variable time and might allow a side channel @@ -221,14 +226,14 @@ It is based on code by Mike Hamburg [VectorAes], see aes_ssse3.cpp. This same technique could be applied with NEON or AltiVec, and the paper suggests some optimizations for the AltiVec shuffle. -On all other processors, a class 4K table lookup version based on the original -Rijndael code is used. This approach relatively fast, but now known to be very -vulnerable to side channels. The implementation does make modifications in the -first and last rounds to reduce the cache signature, but these merely increase -the number of observations required. See [AesCacheColl] for one paper which -analyzes a number of implementations including Botan. Botan already follows both -of their suggested countermeasures, which increased the number of samples -required from 2**13 to the only slightly less pitiful 2**19 samples. +On all other processors, a table lookup version derived from the original +Rijndael code is used. This approach is relatively fast, but now known to be +very vulnerable to side channels. To reduce the side channel signature, it uses +only a 1K table (instead of 4 1K tables which is typical) and uses small tables +in the first and last rounds. See [AesCacheColl] for one paper which analyzes a +number of implementations including (an older version of) Botan. Botan already +follows both of their suggested countermeasures, which increased the number of +samples required from 2**13 to the only slightly less pitiful 2**19 samples. The Botan block cipher API already supports bitslicing implementations, so a const time 8x bitsliced AES could be integrated fairly easily. @@ -236,11 +241,11 @@ const time 8x bitsliced AES could be integrated fairly easily. GCM --------------------- -On platforms that support a carryless multiply instruction (recent x86 and ARM), +On platforms that support a carryless multiply instruction (ARMv8 and recent x86), GCM is fast and constant time. On all other platforms, GCM uses a slow but constant time algorithm. There is -also an SSSE3 variant of the same (still slow) algorithm. +also an SSSE3 variant of the same (still relatively slow) algorithm. OCB ----------------------- |