\documentclass{article} \setlength{\textwidth}{6.5in} \setlength{\textheight}{9in} \setlength{\headheight}{0in} \setlength{\topmargin}{0in} \setlength{\headsep}{0in} \setlength{\oddsidemargin}{0in} \setlength{\evensidemargin}{0in} \title{\textbf{Botan Reference Manual}} \author{} \date{2010/06/14} \newcommand{\filename}[1]{\texttt{#1}} \newcommand{\manpage}[2]{\texttt{#1}(#2)} \newcommand{\macro}[1]{\texttt{#1}} \newcommand{\function}[1]{\textbf{#1}} \newcommand{\keyword}[1]{\texttt{#1}} \newcommand{\type}[1]{\texttt{#1}} \renewcommand{\arg}[1]{\textsl{#1}} \newcommand{\namespace}[1]{\texttt{#1}} \newcommand{\url}[1]{\texttt{#1}} \newcommand{\ie}[0]{\emph{i.e.}} \newcommand{\eg}[0]{\emph{e.g.}} \begin{document} \maketitle \tableofcontents \parskip=5pt \pagebreak \section{Introduction} Botan is a C++ library that attempts to provide the most common cryptographic algorithms and operations in an easy to use, efficient, and portable way. It runs on a wide variety of systems, and can be used with a number of different compilers. The base library is written in ISO C++, so it can be ported with minimal fuss, but Botan also supports a modules system. This system exposes system dependent code to the library through portable interfaces, extending the set of services available to users. \subsection{Recommended Reading} It's a very good idea if you have some knowledge of cryptography prior to trying to use this stuff. You really should read at least one and ideally all of these books before seriously using the library. \setlength{\parskip}{5pt} \noindent \textit{Cryptography Engineering}, Niels Ferguson, Bruce Schneier, and Tadayoshi Kohno; Wiley \noindent \textit{Security Engineering -- A Guide to Building Dependable Distributed Systems}, Ross Anderson; Wiley \noindent \textit{Handbook of Applied Cryptography}, Alfred J. Menezes, Paul C. Van Oorschot, and Scott A. Vanstone; CRC Press (available online at \url{http://www.cacr.math.uwaterloo.ca/hac/}) \subsection{Targets} Botan's primary targets (system-wise) are 32 and 64-bit CPUs, with a flat memory address space of at least 32 bits. Given the choice between optimizing for 32-bit systems and 64-bit systems, Botan is written to prefer 64-bit, on the theory that where performance is a real concern, modern 64-bit processors are the obvious choice. Smaller handhelds, set-top boxes, and the bigger smart phones and smart cards, are also capable of using Botan. However, Botan uses a large amount of code space (up to several megabytes, depending upon the compiler and options used), which could be prohibitive in some systems. Usage of RAM is modest, usually under 64K. Botan's design makes it quite easy to remove unused algorithms in such a way that applications do not need to be recompiled to work, even applications that use the algorithms in question. They can ask Botan if the algorithm exists, and if Botan says yes, ask the library to give them such an object for that algorithm. \section{Getting Started} \subsection{Basic Conventions} With a very small number of exceptions, declarations in the library are contained within the namespace \namespace{Botan}. Botan declares several \keyword{typedef}'ed types to help buffer it against changes in machine architecture. These types are used extensively in the interface, thus it would be often be convenient to use them without the \namespace{Botan} prefix. You can do so by \keyword{using} the namespace \namespace{Botan\_types} (this way you can use the type names without the namespace prefix, but the remainder of the library stays out of the global namespace). The included types are \type{byte} and \type{u32bit}, which are unsigned integer types. The headers for Botan are usually available in the form \filename{botan/headername.h}. For brevity in this documentation, headers are always just called \filename{headername.h}, but they should be used with the \filename{botan/} prefix in your actual code. \subsection{Initializing the Library} There is a set of core services that the library needs access to while it is performing requests. To ensure these are set up, you must create a \type{LibraryInitializer} object (usually called 'init' in Botan example code; 'botan\_library' or 'botan\_init' may make more sense in real applications) prior to making any calls to Botan. This object's lifetime must exceed that of all other Botan objects your application creates; for this reason the best place to create the \type{LibraryInitializer} is at the start of your \function{main} function, since this guarantees that it will be created first and destroyed last (via standard C++ RAII rules). The initializer does things like setting up the memory allocation system and algorithm lookup tables, finding out if there is a high resolution timer available to use, and similar such matters. With no arguments, the library is initialized with various default settings. So (unless you are writing threaded code; see below), all you need is: \texttt{Botan::LibraryInitializer init;} at the start of your \texttt{main}. The constructor takes an optional string that specifies arguments. Currently the only possible argument is ``thread\_safe'', which must have an boolean argument (for instance ``thread\_safe=false'' or ``thread\_safe=true''). If ``thread\_safe'' is specified as true the library will attempt to register a mutex type to properly guard access to shared resources. However these locks do not protect individual Botan objects: explicit locking must be used if you wish to share a single object between threads. If you do not create a \type{LibraryInitializer} object, all library operations will fail, because it will be unable to do basic things like allocate memory or get random bits. You should never create more than one \type{LibraryInitializer}. It is not strictly necessary to create a \type{LibraryInitializer}; the actual code performing the initialization and shutdown are in static member functions of \type{LibraryInitializer}, called \function{initialize} and \function{deinitialize}. A \type{LibraryInitializer} merely provides a convenient RAII wrapper for the operations (thus for the internal library state as well). \subsection{Pitfalls} There are a few things to watch out for to prevent problems when using Botan. Never allocate any kind of Botan object globally. The problem with doing this is that the constructor for such an object will be called before the library is initialized. Many Botan objects will, in their constructor, make one or more calls into the library global state object. Access to this object is checked, so an exception should be thrown (rather than a memory access violation or undetected uninitialized object access). A rough equivalent that will work is to keep a global pointer to the object, initializing it after creating your \type{LibraryInitializer}. Merely making the \type{LibraryInitializer} also global will probably not help, because C++ does not make very strong guarantees about the order that such objects will be created. The same rule applies for making sure the destructors of all your Botan objects are called before the \type{LibraryInitializer} is destroyed. This implies you can't have static variables that are Botan objects inside functions or classes; in many C++ runtimes, these objects will be destroyed after main has returned. Botan's memory object classes (\type{MemoryRegion}, \type{MemoryVector}, \type{SecureVector}) are extremely primitive, and meant only for secure storage of potentially sensitive data like keys. They do not meet the requirements for an STL container object and you should not try to use them with STL algorithms. For a general-purpose container, use \type{std::vector}. Use a \function{try}/\function{catch} block inside your \function{main} function, and catch any \type{std::exception} throws (remember to catch by reference, as \type{std::exception}'s \function{what} method is polymorphic). This is not strictly required, but if you don't, and Botan throws an exception, the runtime will call \function{std::terminate}, which usually calls \function{abort} or something like it, leaving you (or worse, a user of your application) wondering what went wrong. \subsection{Information Flow: Pipes and Filters} Many common uses of cryptography involve processing one or more streams of data. Botan provides services that make setting up data flows through various operations, such as compression, encryption, and base64 encoding. Each of these operations is implemented in what are called \emph{filters} in Botan. A set of filters are created and placed into a \emph{pipe}, and information ``flows'' through the pipe until it reaches the end, where the output is collected for retrieval. If you're familiar with the Unix shell environment, this design will sound quite familiar. Here is an example that uses a pipe to base64 encode some strings: \begin{verbatim} Pipe pipe(new Base64_Encoder); // pipe owns the pointer pipe.start_msg(); pipe.write(``message 1''); pipe.end_msg(); // flushes buffers, increments message number // process_msg(x) is start_msg() && write(x) && end_msg() pipe.process_msg(``message2''); std::string m1 = pipe.read_all_as_string(0); // ``message1'' std::string m2 = pipe.read_all_as_string(1); // ``message2'' \end{verbatim} Bytestreams in the pipe are grouped into messages; blocks of data that are processed in an identical fashion (\ie, with the same sequence of \type{Filter}s). Messages are delimited by calls to \function{start\_msg} and \function{end\_msg}. Each message in a pipe has its own identifier, which currently is an integer that increments up from zero. As you can see, the \type{Base64\_Encoder} was allocated using \keyword{new}; but where was it deallocated? When a filter object is passed to a \type{Pipe}, the pipe takes ownership of the object, and will deallocate it when it is no longer needed. There are two different ways to make use of messages. One is to send several messages through a \type{Pipe} without changing the \type{Pipe}'s configuration, so you end up with a sequence of messages; one use of this would be to send a sequence of identically encrypted UDP packets, for example (note that the \emph{data} need not be identical; it is just that each is encrypted, encoded, signed, etc in an identical fashion). Another is to change the filters that are used in the \type{Pipe} between each message, by adding or removing \type{Filter}s; functions that let you do this are documented in the Pipe API section. Botan has about 40 filters that perform different operations on data. Here's code that uses one of them to encrypt a string with AES: \begin{verbatim} AutoSeeded_RNG rng, SymmetricKey key(rng, 16); // a random 128-bit key InitializationVector iv(rng, 16); // a random 128-bit IV // The algorithm we want is specified by a string Pipe pipe(get_cipher(``AES-128/CBC'', key, iv, ENCRYPTION)); pipe.process_msg(``secrets''); pipe.process_msg(``more secrets''); MemoryVector c1 = pipe.read_all(0); byte c2[4096] = { 0 }; u32bit got_out = pipe.read(c2, sizeof(c2), 1); // use c2[0...got_out] \end{verbatim} Note the use of \type{AutoSeeded\_RNG}, which is a random number generator. If you want to, you can explicitly set up the random number generators and entropy sources you want to, however for 99\% of cases \type{AutoSeeded\_RNG} is preferable. \type{Pipe} also has convenience methods for dealing with \type{std::iostream}s. Here is an example of those, using the \type{Bzip\_Compression} filter (included as a module; if you have bzlib available, check \filename{building.pdf} for how to enable it) to compress a file: \begin{verbatim} std::ifstream in(``data.bin'', std::ios::binary) std::ofstream out(``data.bin.bz2'', std::ios::binary) Pipe pipe(new Bzip_Compression); pipe.start_msg(); in >> pipe; pipe.end_msg(); out << pipe; \end{verbatim} However there is a hitch to the code above; the complete contents of the compressed data will be held in memory until the entire message has been compressed, at which time the statement \verb|out << pipe| is executed, and the data is freed as it is read from the pipe and written to the file. But if the file is very large, we might not have enough physical memory (or even enough virtual memory!) for that to be practical. So instead of storing the compressed data in the pipe for reading it out later, we divert it directly to the file: \begin{verbatim} std::ifstream in(``data.bin'', std::ios::binary) std::ofstream out(``data.bin.bz2'', std::ios::binary) Pipe pipe(new Bzip_Compression, new DataSink_Stream(out)); pipe.start_msg(); in >> pipe; pipe.end_msg(); \end{verbatim} This is the first code we've seen so far that uses more than one filter in a pipe. The output of the compressor is sent to the \type{DataSink\_Stream}. Anything written to a \type{DataSink\_Stream} is written to a file; the filter produces no output. As soon as the compression algorithm finishes up a block of data, it will send it along, at which point it will immediately be written to disk; if you were to call \verb|pipe.read_all()| after \verb|pipe.end_msg()|, you'd get an empty vector out. Here's an example using two computational filters: \begin{verbatim} AutoSeeded_RNG rng, SymmetricKey key(rng, 32); InitializationVector iv(rng, 16); Pipe encryptor(get_cipher("AES/CBC/PKCS7", key, iv, ENCRYPTION), new Base64_Encoder); encryptor.start_msg(); file >> encryptor; encryptor.end_msg(); // flush buffers, complete computations std::cout << encryptor; \end{verbatim} \subsection{Fork} It is common that you might receive some data and want to perform more than one operation on it (\ie, encrypt it with Serpent and calculate the SHA-256 hash of the plaintext at the same time). That's where \type{Fork} comes in. \type{Fork} is a filter that takes input and passes it on to \emph{one or more} \type{Filter}s that are attached to it. \type{Fork} changes the nature of the pipe system completely. Instead of being a linked list, it becomes a tree. Each \type{Filter} in the fork is given its own output buffer, and thus its own message. For example, if you had previously written two messages into a \type{Pipe}, then you start a new one with a \type{Fork} that has three paths of \type{Filter}'s inside it, you add three new messages to the \type{Pipe}. The data you put into the \type{Pipe} is duplicated and sent into each set of \type{Filter}s, and the eventual output is placed into a dedicated message slot in the \type{Pipe}. Messages in the \type{Pipe} are allocated in a depth-first manner. This is only interesting if you are using more than one \type{Fork} in a single \type{Pipe}. As an example, consider the following: \begin{verbatim} Pipe pipe(new Fork( new Fork( new Base64_Encoder, new Fork( NULL, new Base64_Encoder ) ), new Hex_Encoder ) ); \end{verbatim} In this case, message 0 will be the output of the first \type{Base64\_Encoder}, message 1 will be a copy of the input (see below for how \type{Fork} interprets NULL pointers), message 2 will be the output of the second \type{Base64\_Encoder}, and message 3 will be the output of the \type{Hex\_Encoder}. As you can see, this results in message numbers being allocated in a top to bottom fashion, when looked at on the screen. However, note that there could be potential for bugs if this is not anticipated. For example, if your code is passed a \type{Filter}, and you assume it is a ``normal'' one that only uses one message, your message offsets would be wrong, leading to some confusion during output. If Fork's first argument is a null pointer, but a later argument is not, then Fork will feed a copy of its input directly through. Here's a case where that is useful: \begin{verbatim} // have std::string ciphertext, auth_code, key, iv, mac_key; Pipe pipe(new Base64_Decoder, get_cipher(``AES-128'', key, iv, DECRYPTION), new Fork( 0 new MAC_Filter(``HMAC(SHA-1)'', mac_key) ) ); pipe.process_msg(ciphertext); std::string plaintext = pipe.read_all_as_string(0); SecureVector mac = pipe.read_all(1); if(mac != auth_code) error(); \end{verbatim} Here we wanted to not only decrypt the message, but send the decrypted text through an additional computation, in order to compute the authentication code. Any \type{Filter}s that are attached to the \type{Pipe} after the \type{Fork} are implicitly attached onto the first branch created by the fork. For example, let's say you created this \type{Pipe}: \begin{verbatim} Pipe pipe(new Fork(new Hash_Filter("MD5"), new Hash_Filter("SHA-1")), new Hex_Encoder); \end{verbatim} And then called \function{start\_msg}, inserted some data, then \function{end\_msg}. Then \arg{pipe} would contain two messages. The first one (message number 0) would contain the MD5 sum of the input in hex encoded form, and the other would contain the SHA-1 sum of the input in raw binary. However, it's much better to use a \type{Chain} instead. \subsubsection{Chain} A \type{Chain} filter creates a chain of \type{Filter}s and encapsulates them inside a single filter (itself). This allows a sequence of filters to become a single filter, to be passed into or out of a function, or to a \type{Fork} constructor. You can call \type{Chain}'s constructor with up to 4 \type{Filter*}s (they will be added in order), or with an array of \type{Filter*}s and a \type{u32bit} that tells \type{Chain} how many \type{Filter*}s are in the array (again, they will be attached in order). Here's the example from the last section, using chain instead of relying on the obscure rule that version used. \begin{verbatim} Pipe pipe(new Fork( new Chain(new Hash_Filter("MD5"), new Hex_Encoder), new Hash_Filter("SHA-1") ) ); \end{verbatim} \subsection{The Pipe API} \subsubsection{Initializing Pipe} By default, \type{Pipe} will do nothing at all; any input placed into the \type{Pipe} will be read back unchanged. Obviously, this has limited utility, and presumably you want to use one or more \type{Filter}s to somehow process the data. First, you can choose a set of \type{Filter}s to initialize the \type{Pipe} via the constructor. You can pass it either a set of up to 4 \type{Filter*}s, or a pre-defined array and a length: \begin{verbatim} Pipe pipe1(new Filter1(/*args*/), new Filter2(/*args*/), new Filter3(/*args*/), new Filter4(/*args*/)); Pipe pipe2(new Filter1(/*args*/), new Filter2(/*args*/)); Filter* filters[5] = { new Filter1(/*args*/), new Filter2(/*args*/), new Filter3(/*args*/), new Filter4(/*args*/), new Filter5(/*args*/) /* more if desired... */ }; Pipe pipe3(filters, 5); \end{verbatim} This is by far the most common way to initialize a \type{Pipe}. However, occasionally a more flexible initialization strategy is necessary; this is supported by 4 member functions: \function{prepend}(\type{Filter*}), \function{append}(\type{Filter*}), \function{pop}(), and \function{reset}(). These functions may only be used while the \type{Pipe} in question is not in use; that is, either before calling \function{start\_msg}, or after \function{end\_msg} has been called (and no new calls to \function{start\_msg} have been made yet). The function \function{reset}() removes all the \type{Filter}s that the \type{Pipe} is currently using~--~it is reset to an initialize, ``empty'' state. Any data that is being retained by the \type{Pipe} is retained after a \function{reset}(), and \function{reset}() does not affect the message numbers (discussed later). Calling \function{prepend} and \function{append} will either prepend or append the passed \type{Filter} object to the list of transformations. For example, if you \function{prepend} a \type{Filter} implementing encryption, and the \type{Pipe} already had a \type{Filter} that hex encoded the input, then the next set of input would be first encrypted, then hex encoded. Alternately, if you called \function{append}, then the input would be first be hex encoded, and then encrypted (which is not terribly useful in this particular example). Finally, calling \function{pop}() will remove the first transformation of the \type{Pipe}. Say we had called \function{prepend} to put an encryption \type{Filter} into a \type{Pipe}; calling \function{pop}() would remove this \type{Filter} and return the \type{Pipe} to its state before we called \function{prepend}. \subsubsection{Giving Data to a Pipe} Input to a \type{Pipe} is delimited into messages, which can be read from independently (\ie, you can read 5 bytes from one message, and then all of another message, without either read affecting any other messages). The messages are delimited by calls to \function{start\_msg} and \function{end\_msg}. In between these two calls, you can write data into a \type{Pipe}, and it will be processed by the \type{Filter}(s) that it contains. Writes at any other time are invalid, and will result in an exception. As to writing, you can call any of the functions called \function{write}(), that can take any of: a \type{byte[]}/\type{u32bit} pair, a \type{SecureVector}, a \type{std::string}, a \type{DataSource\&}, or a single \type{byte}. Sometimes, you may want to do only a single write per message. In this case, you can use the \function{process\_msg} series of functions, which start a message, write their argument into the \type{Pipe}, and then end the message. In this case you would not make any explicit calls to \function{start\_msg}/\function{end\_msg}. The version of \function{write} that takes a single \type{byte} is not supported by \function{process\_msg}, but all the other variants are. \type{Pipe} can also be used with the \verb|>>| operator, and will accept a \type{std::istream}, (or on Unix systems with the \verb|fd_unix| module), a Unix file descriptor. In either case, the entire contents of the file will be read into the \type{Pipe}. \subsubsection{Getting Output from a Pipe} Retrieving the processed data from a \type{Pipe} is a bit more complicated, for various reasons. In particular, because \type{Pipe} will separate each message into a separate buffer, you have to be able to retrieve data from each message independently. Each of \type{Pipe}'s read functions has a final parameter that specifies what message to read from (as a 32-bit integer). If this parameter is set to \type{Pipe::DEFAULT\_MESSAGE}, it will read the current default message (\type{DEFAULT\_MESSAGE} is also the default value of this parameter). The parameter will not be mentioned in further discussion of the reading API, but it is always there (unless otherwise noted). Reading is done with a variety of functions. The most basic are \type{u32bit} \function{read}(\type{byte} \arg{out}[], \type{u32bit} \arg{len}) and \type{u32bit} \function{read}(\type{byte\&} \arg{out}). Each reads into \arg{out} (either up to \arg{len} bytes, or a single byte for the one taking a \type{byte\&}), and returns the total number of bytes read. There is a variant of these functions, all named \function{peek}, which performs the same operations, but does not remove the bytes from the message (reading is a destructive operation with a \type{Pipe}). There are also the functions \type{SecureVector} \function{read\_all}(), and \type{std::string} \function{read\_all\_as\_string}(), which return the entire contents of the message, either as a memory buffer, or a \type{std::string} (which is generally only useful if the \type{Pipe} has encoded the message into a text string, such as when a \type{Base64\_Encoder} is used). To determine how many bytes are left in a message, call \type{u32bit} \function{remaining}() (which can also take an optional message number). Finally, there are some functions for managing the default message number: \type{u32bit} \function{default\_msg}() will return the current default message, \type{u32bit} \function{message\_count}() will return the total number of messages (0...\function{message\_count}()-1), and \function{set\_default\_msg}(\type{u32bit} \arg{msgno}) will set a new default message number (which must be a valid message number for that \type{Pipe}). The ability to set the default message number is particularly important in the case of using the file output operations (\verb|<<| with a \type{std::ostream} or Unix file descriptor), because there is no way to specify it explicitly when using the output operator. \subsection{A Filter Example} Here is some code that takes one or more filenames in \arg{argv} and calculates the result of several hash functions for each file. The complete program can be found as \filename{hasher.cpp} in the Botan distribution. For brevity, error checking has been removed. \begin{verbatim} string name[3] = { "MD5", "SHA-1", "RIPEMD-160" }; Botan::Filter* hash[3] = { new Botan::Chain(new Botan::Hash_Filter(name[0]), new Botan::Hex_Encoder), new Botan::Chain(new Botan::Hash_Filter(name[1]), new Botan::Hex_Encoder), new Botan::Chain(new Botan::Hash_Filter(name[2]), new Botan::Hex_Encoder) }; Botan::Pipe pipe(new Botan::Fork(hash, COUNT)); for(u32bit j = 1; argv[j] != 0; j++) { ifstream file(argv[j]); pipe.start_msg(); file >> pipe; pipe.end_msg(); file.close(); for(u32bit k = 0; k != 3; k++) { pipe.set_default_msg(3*(j-1)+k); cout << name[k] << "(" << argv[j] << ") = " << pipe << endl; } } \end{verbatim} \subsection{Filter Catalog} This section contains descriptions of every \type{Filter} included in the portable sections of Botan. \type{Filter}s provided by modules are documented elsewhere. \subsubsection{Keyed Filters} A few sections ago, it was mentioned that \type{Pipe} can process multiple messages, treating each of them the same. Well, that was a bit of a lie. There are some algorithms (in particular, block ciphers not in ECB mode, and all stream ciphers) that change their state as data is put through them. Naturally, you might well want to reset the keys or (in the case of block cipher modes) IVs used by such filters, so multiple messages can be processed using completely different keys, or new IVs, or new keys and IVs, or whatever. And in fact, even for a MAC or an ECB block cipher, you might well want to change the key used from message to message. Enter \type{Keyed\_Filter}, which acts as an abstract interface for any filter that is uses keys: block cipher modes, stream ciphers, MACs, and so on. It has two functions, \function{set\_key} and \function{set\_iv}. Calling \function{set\_key} will, naturally, set (or reset) the key used by the algorithm. Setting the IV only makes sense in certain algorithms -- a call to \function{set\_iv} on an object that doesn't support IVs will be ignored. You \emph{must} call \function{set\_key} before calling \function{set\_iv}: while not all \type{Keyed\_Filter} objects require this, you should assume it is required anytime you are using a \type{Keyed\_Filter}. Here's a example: \begin{verbatim} Keyed_Filter *cast, *hmac; Pipe pipe(new Base64_Decoder, // Note the assignments to the cast and hmac variables cast = new CBC_Decryption("CAST-128", "PKCS7", cast_key, iv), new Fork( 0, // Read the section 'Fork' to understand this new Chain( hmac = new MAC_Filter("HMAC(SHA-1)", mac_key, 12), new Base64_Encoder ) ) ); pipe.start_msg(); [use pipe for a while, decrypt some stuff, derive new keys and IVs] pipe.end_msg(); cast->set_key(cast_key2); cast->set_iv(iv2); hmac->set_key(mac_key2); pipe.start_msg(); [use pipe for some other things] pipe.end_msg(); \end{verbatim} There are some requirements to using \type{Keyed\_Filter} that you must follow. If you call \function{set\_key} or \function{set\_iv} on a filter that is owned by a \type{Pipe}, you must do so while the \type{Pipe} is ``unlocked''. This refers to the times when no messages are being processed by \type{Pipe} -- either before \type{Pipe}'s \function{start\_msg} is called, or after \function{end\_msg} is called (and no new call to \function{start\_msg} has happened yet). Doing otherwise will result in undefined behavior, probably silently getting invalid output. And remember: if you're resetting both values, reset the key \emph{first}. \subsubsection{Cipher Filters} Getting a hold of a \type{Filter} implementing a cipher is very easy. Make sure you're including the header \filename{lookup.h}, and then call \function{get\_cipher}. You will pass the return value directly into a \type{Pipe}. There are a couple different functions which do varying levels of initialization: \function{get\_cipher}(\type{std::string} \arg{cipher\_spec}, \type{SymmetricKey} \arg{key}, \type{InitializationVector} \arg{iv}, \type{Cipher\_Dir} \arg{dir}); \function{get\_cipher}(\type{std::string} \arg{cipher\_spec}, \type{SymmetricKey} \arg{key}, \type{Cipher\_Dir} \arg{dir}); The version that doesn't take an IV is useful for things that don't use them, like block ciphers in ECB mode, or most stream ciphers. If you specify a \arg{cipher\_spec} that does want a IV, and you use the version that doesn't take one, an exception will be thrown. The \arg{dir} argument can be either \type{ENCRYPTION} or \type{DECRYPTION}. The \arg{cipher\_spec} is a string that specifies what cipher is to be used. The general syntax for \arg{cipher\_spec} is ``STREAM\_CIPHER'', ``BLOCK\_CIPHER/MODE'', or ``BLOCK\_CIPHER/MODE/PADDING''. In the case of stream ciphers, no mode is necessary, so just the name is sufficient. A block cipher requires a mode of some sort, which can be ``ECB'', ``CBC'', ``CFB(n)'', ``OFB'', ``CTR-BE'', or ``EAX(n)''. The argument to CFB mode is how many bits of feedback should be used. If you just use ``CFB'' with no argument, it will default to using a feedback equal to the block size of the cipher. EAX mode also takes an optional bit argument, which tells EAX how large a tag size to use~--~generally this is the size of the block size of the cipher, which is the default if you don't specify any argument. In the case of the ECB and CBC modes, a padding method can also be specified. If it is not supplied, ECB defaults to not padding, and CBC defaults to using PKCS \#5/\#7 compatible padding. The padding methods currently available are ``NoPadding'', ``PKCS7'', ``OneAndZeros'', and ``CTS''. CTS padding is currently only available for CBC mode, but the others can also be used in ECB mode. Some example \arg{cipher\_spec} arguments are: ``AES-128/CBC'', ``Blowfish/CTR-BE'', ``Serpent/XTS'', and ``AES-256/EAX''. ``CTR-BE'' refers to counter mode where the counter is incremented as if it were a big-endian encoded integer. This is compatible with most other implementations, but it is possible some will use the incompatible little endian convention. This version would be denoted as ``CTR-LE'' if it were supported. ``EAX'' is a new cipher mode designed by Wagner, Rogaway, and Bellare. It is an authenticated cipher mode (that is, no separate authentication is needed), has provable security, and is free from patent entanglements. It runs about half as fast as most of the other cipher modes (like CBC, OFB, or CTR), which is not bad considering you don't need to use an authentication code. \subsubsection{Hashes and MACs} Hash functions and MACs don't need anything special when it comes to filters. Both just take their input and produce no output until \function{end\_msg()} is called, at which time they complete the hash or MAC and send that as output. These \type{Filter}s take a string naming the type to be used. If for some reason you name something that doesn't exist, an exception will be thrown. \noindent \function{Hash\_Filter}(\type{std::string} \arg{hash}, \type{u32bit} \arg{outlength}): This type hashes its input with \arg{hash}. When \function{end\_msg} is called on the owning \type{Pipe}, the hash is completed and the digest is sent on to the next thing in the pipe. The argument \arg{outlength} specifies how much of the output of the hash will be passed along to the next filter when \function{end\_msg} is called. By default, it will pass the entire hash. Examples of names for \function{Hash\_Filter} are ``SHA-1'' and ``Whirlpool''. \noindent \function{MAC\_Filter}(\type{std::string} \arg{mac}, \type{const SymmetricKey\&} \arg{key}, \type{u32bit} \arg{outlength}): The constructor for a \type{MAC\_Filter} takes a key, used in calculating the MAC, and a length parameter, which has semantics the same as the one passed to \type{Hash\_Filter}s constructor. Examples for \arg{mac} are ``HMAC(SHA-1)'', ``CMAC(AES-128)'', and the exceptionally long, strange, and probably useless name ``CMAC(Lion(Tiger(20,3),MARK-4,1024))''. \subsubsection{PK Filters} There are four classes in this category, \type{PK\_Encryptor\_Filter}, \type{PK\_Decryptor\_Filter}, \type{PK\_Signer\_Filter}, and \type{PK\_Verifier\_Filter}. Each takes a pointer to an object of the appropriate type (\type{PK\_Encryptor}, \type{PK\_Decryptor}, etc) that is deleted by the destructor. These classes are found in \filename{pk\_filts.h}. Three of these, for encryption, decryption, and signing are much the same in terms of dataflow - ach of them buffers its input until the end of the message is marked with a call to the \function{end\_msg} function. Then they encrypt, decrypt, or sign the entire input as a single blob and send the output (the ciphertext, the plaintext, or the signature) into the next filter. Signature verification works a little differently, because it needs to know what the signature is in order to check it. You can either pass this in along with the constructor, or call the function \function{set\_signature} -- with this second method, you need to keep a pointer to the filter around so you can send it this command. In either case, after \function{end\_msg} is called, it will try to verify the signature (if the signature has not been set by either method, an exception will be thrown here). It will then send a single byte onto the next filter -- a 1 or a 0, which specifies whether the signature verified or not (respectively). For more information about PK algorithms (including creating the appropriate objects to pass to the constructors), read the section ``Public Key Cryptography'' in this manual. \subsubsection{Encoders} Often you want your data to be in some form of text (for sending over channels that aren't 8-bit clean, printing it, etc). The filters \type{Hex\_Encoder} and \type{Base64\_Encoder} will convert arbitrary binary data into hex or base64 formats. Not surprisingly, you can use \type{Hex\_Decoder} and \type{Base64\_Decoder} to convert it back into its original form. Both of the encoders can take a few options about how the data should be formatted (all of which have defaults). The first is a \type{bool} which says if the encoder should insert line breaks. This defaults to false. Line breaks don't matter either way to the decoder, but it makes the output a bit more appealing to the human eye, and a few transport mechanisms (notably some email systems) limit the maximum line length. The second encoder option is an integer specifying how long such lines will be (obviously this will be ignored if line-breaking isn't being used). The default tends to be in the range of 60-80 characters, but is not specified. If you want a specific value, set it. Otherwise the default should be fine. Lastly, \type{Hex\_Encoder} takes an argument of type \type{Case}, which can be \type{Uppercase} or \type{Lowercase} (default is \type{Uppercase}). This specifies what case the characters A-F should be output as. The base64 encoder has no such option, because it uses both upper and lower case letters for its output. The decoders both take a single option, which tells it how the object should behave in the case of invalid input. The enum (called \type{Decoder\_Checking}) can take on any of three values: \type{NONE}, \type{IGNORE\_WS}, and \type{FULL\_CHECK}. With \type{NONE} (the default, for compatibility with previous releases), invalid input (for example, a ``z'' character in supposedly hex input) will be ignored. With \type{IGNORE\_WS}, whitespace will be ignored by the decoder, but receiving other non-valid data will raise an exception. Finally, \type{FULL\_CHECK} will raise an exception for \emph{any} characters not in the encoded character set, including whitespace. You can find the declarations for these types in \filename{hex.h} and \filename{base64.h}. \subsection{Rolling Your Own} The system of filters and pipes was designed in an attempt to make it as simple as possible to write new \type{Filter} objects. There are four functions that need to be implemented by an object deriving from \type{Filter}: \noindent \type{void} \function{write}(\type{byte} \arg{input}[], \type{u32bit} \arg{length}): The \function{write} function is what is called when a filter receives input for it to process. The filter is \emph{not} required to process it right away; many filters buffer their input before producing any output. A filter will usually have \function{write} called many times during its lifetime. \noindent \type{void} \function{send}(\type{byte} \arg{output}[], \type{u32bit} \arg{length}): Eventually, a filter will want to produce some output to send along to the next filter in the pipeline. It does so by calling \function{send} with whatever it wants to send along to the next filter. There is also a version of \function{send} taking a single byte argument, as a convenience. \noindent \type{void} \function{start\_msg()}: This function is optional. Implement it if your \type{Filter} would like to do some processing or setup at the start of each message (for an example, see the Zlib compression module). \noindent \type{void} \function{end\_msg()}: Implementing the \function{end\_msg} function is optional. It is called when it has been requested that filters finish up their computations. The filter should finish up with whatever computation it is working on (for example, a compressing filter would flush the compressor and \function{send} the final block), and empty any buffers in preparation for processing a fresh new set of input. Additionally, if necessary, filters can define a constructor that takes any needed arguments, and a destructor to deal with deallocating memory, closing files, etc. \section{Public Key Cryptography} Let's create a 1024-bit RSA private key, encode the public key as a PKCS \#1 file with PEM encoding (which can be understood by many other cryptographic programs) \begin{verbatim} // everyone does: AutoSeeded_RNG rng; // Alice RSA_PrivateKey priv_rsa(rng, 1024 /* bits */); std::string alice_pem = X509::PEM_encode(priv_rsa); // send alice_pem to Bob, who does // Bob std::auto_ptr alice(load_key(alice_pem)); RSA_PublicKey* alice_rsa = dynamic_cast(alice); if(alice_rsa) { /* ... */ } \end{verbatim} \subsection{Creating PK Algorithm Key Objects} The library has interfaces for encryption, signatures, etc that do not require knowing the exact algorithm in use (for example RSA and Rabin-Williams signatures are handled by the exact same code path). One place where we \emph{do} need to know exactly what kind of algorithm is in use is when we are creating a key (\emph{But}: read the section ``Importing and Exporting PK Keys'', later in this manual). There are currently three kinds of public key algorithms in Botan: ones based on integer factorization (RSA and Rabin-Williams), ones based on the discrete logarithm problem in the integers modulo a prime (DSA, Diffie-Hellman, Nyberg-Rueppel, and ElGamal), and ones based on the discrete logarithm problem in an elliptic curve (ECDSA, ECDH, GOST 34.10). The systems based on discrete logarithms (in either regular integers or elliptic curves) use a group (a mathematical term), which can be shared among many keys. An elliptic curve group is represented by the class \type{EC\_Domain\_Params}, while a modulo-prime group is represented by a \type{DL\_Group}. There are two ways to create a DL private key (such as \type{DSA\_PrivateKey}). One is to pass in just a \type{DL\_Group} object -- a new key will automatically be generated. The other involves passing in a group to use, along with both the public and private values (private value first). Since in integer factorization algorithms, the modulus used isn't shared by other keys, we don't use this notion. You can create a new key by passing in a \type{u32bit} telling how long (in bits) the key should be, or you can copy an pre-existing key by passing in the appropriate parameters (primes, exponents, etc). For RSA and Rabin-Williams (the two IF schemes in Botan), the parameters are all \type{BigInt}s: prime 1, prime 2, encryption exponent, decryption exponent, modulus. The last two are optional, since they can easily be derived from the first three. \subsubsection{Creating a DL\_Group} There are quite a few ways to get a \type{DL\_Group} object. The best is to use the function \function{get\_dl\_group}, which takes a string naming a group; it will either return that group, if it knows about it, or throw an exception. Names it knows about include ``IETF-n'' where n is 768, 1024, 1536, 2048, 3072, or 4096, and ``DSA-n'', where n is 512, 768, or 1024. The IETF groups are the ones specified for use with IPSec, and the DSA ones are the default DSA parameters specified by Java's JCE. For DSA and Nyberg-Rueppel, you should only use the ``DSA-n'' groups, while Diffie-Hellman and ElGamal can use either type (keep in mind that some applications/standards require DH/ELG to use DSA-style primes, while others require strong prime groups). You can also generate a new random group. This is not recommend, because it is quite slow, especially for safe primes. \subsection{Key Checking} Most public key algorithms have limitations or restrictions on their parameters. For example RSA requires an odd exponent, and algorithms based on the discrete logarithm problem need a generator $> 1$. Each low-level public key type has a function named \function{check\_key} that takes a \type{bool}. This function returns a Boolean value that declares whether or not the key is valid (from an algorithmic standpoint). For example, it will check to make sure that the prime parameters of a DSA key are, in fact, prime. It does not have anything to do with the validity of the key for any particular use, nor does it have anything to do with certificates that link a key (which, after all, is just some numbers) with a user or other entity. If \function{check\_key}'s argument is \type{true}, then it does ``strong'' checking, which includes expensive operations like primality checking. Keys are always checked when they are loaded or generated, so typically there is no reason to use this function directly. However, you can disable or reduce the checks for particular cases (public keys, loaded private keys, generated private keys) by setting the right config toggle (see the section on the configuration subsystem for details). \subsection{Getting a PK algorithm object} The key types, like \type{RSA\_PrivateKey}, do not implement any kind of padding or encoding (which is necessary for security). To get an object that knows how to do padding, use the wrapper classes included in \filename{pubkey.h}. These take a key, along with a string that specifies what hashing and encoding method(s) to use. Examples of such strings are ``EME1(SHA-256)'' for OAEP encryption and ``EMSA4(SHA-256)'' for PSS signatures (where the message is hashed using SHA-256). Here are some basic examples (using an RSA key) to give you a feel for the possibilities. These examples assume \type{rsakey} is an \type{RSA\_PrivateKey}, since otherwise we would not be able to create a decryption or signature object with it (you can create encryption or signature verification objects with public keys, naturally). \begin{verbatim} // PKCS #1 v2.0 / IEEE 1363 compatible encryption PK_Encryptor_EME rsa_enc_pkcs1_v2(rsakey, "EME1(SHA-1)"); // PKCS #1 v1.5 compatible encryption PK_Encryptor_EME rsa_enc_pkcs1_v15(rsakey, "PKCS1v15") // This object can decrypt things encrypted by rsa_ PK_Decryptor_EME rsa_dec_pkcs1_v2(rsakey, "EME1(SHA-1)"); // PKCS #1 v1.5 compatible signatures PK_Signer rsa_sign_pkcs1_v15(rsakey, "EMSA3(MD5)"); PK_Verifier rsa_verify_pkcs1_v15(rsakey, "EMSA3(MD5)"); // PKCS #1 v2.1 compatible signatures PK_Signer rsa_sign_pkcs1_v2(rsakey, "EMSA4(SHA-1)"); PK_Verifier rsa_verify_pkcs1_v2(rsakey, "EMSA4(SHA-1)"); \end{verbatim} \subsection{Encryption} The \type{PK\_Encryptor} and \type{PK\_Decryptor} classes are the interface for encryption and decryption, respectively. Calling \function{encrypt} with a \type{byte} array, a length parameter, and an RNG object will return the input encrypted with whatever scheme is being used. Calling the similar \function{decrypt} will perform the inverse operation. You can also do these operations with \type{SecureVector}s. In all cases, the output is returned via a \type{SecureVector}. If you attempt an operation with a larger size than the key can support (this limit varies based on the algorithm, the key size, and the padding method used (if any)), an exception will be thrown. You can call \function{maximum\_input\_size} to find out the maximum size input (in bytes) that you can safely use with any particular key. Available public key encryption algorithms in Botan are RSA and ElGamal. The encoding methods are EME1, denoted by ``EME1(HASHNAME)'', PKCS \#1 v1.5, called ``PKCS1v15'' or ``EME-PKCS1-v1\_5'', and raw encoding (``Raw''). For compatibility reasons, PKCS \#1 v1.5 is recommend for use with ElGamal (most other implementations of ElGamal do not support any other encoding format). RSA can also be used with PKCS \# 1 encoding, but because of various possible attacks, EME1 is the preferred encoding. EME1 requires the use of a hash function: unless a competent applied cryptographer tells you otherwise, you should use SHA-256 or SHA-512. Don't use ``Raw'' encoding unless you need it for backward compatibility with old protocols. There are many possible attacks against both ElGamal and RSA when they are used in this way. \subsection{Signatures} The signature algorithms look quite a bit like the hash functions. You can repeatedly call \function{update}, giving more and more of a message you wish to sign, and then call \function{signature}, which will return a signature for that message. If you want to do it all in one shot, call \function{sign\_message}, which will just call \function{update} with its argument and then return whatever \function{signature} returns. Generating a signature requires random numbers with some schemes, so \function{signature} and \function{sign\_message} both take a \type{RandomNumberGenerator\&}. You can validate a signature by updating the verifier class, and finally seeing the if the value returned from \function{check\_signature} is true (you pass the supposed signature to the \function{check\_signature} function as a byte array and a length or as a \type{MemoryRegion}). There is another function, \function{verify\_message}, which takes a pair of byte array/length pairs (or a pair of \type{MemoryRegion} objects), the first of which is the message, the second being the (supposed) signature. It returns true if the signature is valid and false otherwise. Available public key signature algorithms in Botan are RSA, DSA, ECDSA, GOST-34.11, Nyberg-Rueppel, and Rabin-Williams. Signature encoding methods include EMSA1, EMSA2, EMSA3, EMSA4, and Raw. All of them, except Raw, take a parameter naming a message digest function to hash the message with. The Raw encoding signs the input directly; if the message is too big, the signing operation will fail. Raw is not useful except in very specialized applications. There are various interactions that make certain encoding schemes and signing algorithms more or less useful. EMSA2 is the usual method for encoding Rabin-William signatures, so for compatibility with other implementations you may have to use that. EMSA4 (also called PSS), also works with Rabin-Williams. EMSA1 and EMSA3 do \emph{not} work with Rabin-Williams. RSA can be used with any of the available encoding methods. EMSA4 is by far the most secure, but is not (as of now) widely implemented. EMSA3 (also called ``EMSA-PKCS1-v1\_5'') is commonly used with RSA (for example in SSL). EMSA1 signs the message digest directly, without any extra padding or encoding. This may be useful, but is not as secure as either EMSA3 or EMSA4. EMSA2 may be used but is not recommended. For DSA, ECDSA, GOST-34.11, and Nyberg-Rueppel, you should use EMSA1. None of the other encoding methods are particularly useful for these algorithms. \subsection{Key Agreement} You can get a hold of a \type{PK\_Key\_Agreement\_Scheme} object by calling \function{get\_pk\_kas} with a key that is of a type that supports key agreement (such as a Diffie-Hellman key stored in a \type{DH\_PrivateKey} object), and the name of a key derivation function. This can be ``Raw'', meaning the output of the primitive itself is returned as the key, or ``KDF1(hash)'' or ``KDF2(hash)'' where ``hash'' is any string you happen to like (hopefully you like strings like ``SHA-256'' or ``RIPEMD-160''), or ``X9.42-PRF(keywrap)'', which uses the PRF specified in ANSI X9.42. It takes the name or OID of the key wrap algorithm that will be used to encrypt a content encryption key. How key agreement works is that you trade public values with some other party, and then each of you runs a computation with the other's value and your key (this should return the same result to both parties). This computation can be called by using \function{derive\_key} with either a byte array/length pair, or a \type{SecureVector} than holds the public value of the other party. The last argument to either call is a number that specifies how long a key you want. Depending on the KDF you're using, you \emph{might not} get back a key of the size you requested. In particular ``Raw'' will return a number about the size of the Diffie-Hellman modulus, and KDF1 can only return a key that is the same size as the output of the hash. KDF2, on the other hand, will always give you a key exactly as long as you request, regardless of the underlying hash used with it. The key returned is a \type{SymmetricKey}, ready to pass to a block cipher, MAC, or other symmetric algorithm. The public value that should be used can be obtained by calling \function{public\_data}, which exists for any key that is associated with a key agreement algorithm. It returns a \type{SecureVector}. ``KDF2(SHA-256)'' is by far the preferred algorithm for key derivation in new applications. The X9.42 algorithm may be useful in some circumstances, but unless you need X9.42 compatibility, KDF2 is easier to use. There is a Diffie-Hellman example included in the distribution, which you may want to examine. \subsection{Importing and Exporting PK Keys} [This section mentions \type{Pipe} and \type{DataSource}, which is not covered until later in the manual. Please read those sections for more about \type{Pipe} and \type{DataSource} and their uses.] There are many, many different (often conflicting) standards surrounding public key cryptography. There is, thankfully, only two major standards surrounding the representation of a public or private key: X.509 (for public keys), and PKCS \#8 (for private keys). Other crypto libraries, like OpenSSL and B-SAFE, also support these formats, so you can easily exchange keys with software that doesn't use Botan. In addition to ``plain'' public keys, Botan also supports X.509 certificates. These are documented in the section ``Certificate Handling'', later in this manual. \subsubsection{Public Keys} The interfaces for doing either of these are quite similar. Let's look at the X.509 stuff first: \begin{verbatim} namespace X509 { MemoryVector BER_encode(const Public_Key& key); std::string PEM_encode(const Public_Key& out); Public_Key* load_key(DataSource& in); Public_Key* load_key(const SecureVector& buffer); } \end{verbatim} The function \function{X509::BER\_encode} will take any \type{Public\_Key} and return a standard binary structure representing the key which can be read by many other crypto libraries. The function \function{X509::PEM\_encode} does the same, but additionally formats it into a text format with headers and base64 encoding. Using PEM is \emph{highly} recommended for many reasons, including compatibility with other software, for transmission over 8-bit unclean channels, because it can be identified by a human without special tools, and because it sometimes allows more sane behavior of tools that process the data. For loading a public key, use one of the variants of \function{load\_key}. This function will return a newly allocated key based on the data from whatever source it is using (assuming, of course, the source is in fact storing a representation of a public key). The encoding used (PEM or BER) need not be specified; the format will be detected automatically. The key is allocated with \function{new}, and should be released with \function{delete} when you are done with it. The first takes a generic \type{DataSource} that you have to create~--~the others are simple wrapper functions that take either a filename or a memory buffer. Here's an example of loading a public key and then encrypting with it: \begin{verbatim} /* Might be RSA, might be ElGamal, might be ... */ Public_Key* key = X509::load_key("pubkey.asc"); /* This might throw an exception if the key doesn't support any encryption operations */ PK_Encryptor_EME encryptor(*key, "EME1(SHA-1)"); SecureVector ciphertext = encryptor.encrypt(msg, size_of_msg); \end{verbatim} \subsubsection{Private Keys} There are two different options for private key import/export. The first is a plaintext version of the private key. This is supported by the following functions: \begin{verbatim} namespace PKCS8 { SecureVector BER_encode(const Private_Key& key); std::string PEM_encode(const Private_Key& key); } \end{verbatim} These functions are similiar to the X.509 functions described previously. The only difference is that they take a \type{Private\_Key} object instead. In most situations, using these is a bad idea, because anyone can come along and grab the private key without having to know any passwords or other secrets. Unless you have very particular security requirements, always use the versions that encrypt the key based on a passphrase. For importing, the same functions can be used for encrypted and unencrypted keys. The other way to export a PKCS \#8 key is to first encode it in the same manner as done above, then encrypt it using a passphrase, and store the whole thing into another structure. This method is definitely preferred, since otherwise the private key is unprotected. The algorithms and structures used here are standardized by PKCS \#5 and PKCS \#8, and can be read by many other crypto libraries. \begin{verbatim} namespace PKCS8 { SecureVector BER_encode(const Private_Key& key, RandomNumberGenerator& rng, const std::string& pass, const std::string& pbe_algo = ""); std::string PEM_encode(const Private_Key& key, RandomNumberGenerator& rng, const std::string& pass, const std::string& pbe_algo = ""); } \end{verbatim} There are three new arguments needed here to support the encryption process in addition to the private key itself. The first is a \type{RandomNumberGenerator}, which is needed for various purposes internally. The \arg{pass} argument is the passphrase that will be used to encrypt the key. Both of these are required. The final (optional) argument is \arg{pbe}; this specifies a particular password based encryption (or PBE) algorithm. If you don't specify a PBE, a compiled in default will be used; this should be fine. Last but not least, there are some functions that will load (and decrypt, if necessary) a PKCS \#8 private key: \begin{verbatim} namespace PKCS8 { Private_Key* load_key( DataSource& in, RandomNumberGenerator& rng, std::function ()> get_passphrase); Private_Key* load_key( const std::string& filename, RandomNumberGenerator& rng, std::function ()> get_passphrase); Private_Key* load_key(DataSource& in, RandomNumberGenerator& rng, std::string passphrase = ""); Private_Key* load_key(const std::string& filename, RandomNumberGenerator& rng, const std::string& passphrase = ""); } \end{verbatim} The versions that take \type{std::string} \arg{passphrase}s are primarily for compatibility, but they are useful in limited circumstances. The versions using \type{std::function} callbacks are how \function{load\_key} is implemented, and provides for much more flexibility. If you use the versions that take just a single passphrase, then if the passphrase passed in is not correct, then an exception is thrown and that is that. However, if you pass in a callback, then you can keep querying to the user until they get it right (or they cancel the action). The first return value of the callback is if the action should continue - if false, \function{load_key} will bail out. Otherwise, it will use the second return value as the supposed passphrase that was used to decrypt the key. If you know (or want to assume) the key is not encrypted, just ignore the passphrase/callback entirely, letting the third parameter default to an empty string. The call will fail if the key was encrypted. All versions need access to a \type{RandomNumberGenerator} in order to perform probabilistic tests on the loaded key material. After loading a key, you can use \function{dynamic\_cast} to find out what operations it supports, and use it appropriately. Remember to \function{delete} the object once you are done with it. \subsubsection{Limitations} As of now Nyberg-Rueppel and Rabin-Williams keys cannot be imported or exported, because they have no official ASN.1 OID or definition. ElGamal keys can (as of Botan 1.3.8) be imported and exported, but the only other implementation that supports the format is Peter Gutmann's Cryptlib. If you can help it, stick to RSA and DSA. \emph{Note}: Currently NR and RW are given basic ASN.1 key formats (which mirror DSA and RSA, respectively), which means that, if they are assigned an OID, they can be imported and exported just as easily as RSA and DSA. You can assign them an OID by putting a line in a Botan configuration file, calling \function{OIDS::add\_oid}, or editing \filename{src/policy.cpp}. Be warned that it is possible that a future version will use a format that is different from the current one (\ie, a newly standardized format). \section{Certificate Handling} A certificate is a binding between some identifying information (called a \emph{subject}) and a public key. This binding is asserted by a signature on the certificate, which is placed there by some authority (the \emph{issuer}) that at least claims that it knows the subject named in the certificate really ``owns'' the private key corresponding to the public key in the certificate. The major certificate format in use today is X.509v3, designed by ISO and further hacked on by dozens (hundreds?) of other organizations. When working with certificates, the main class to remember is \type{X509\_Certificate}. You can read an object of this type, but you can't create one on the fly; a CA object is necessary for making a new certificate. So for the most part, you only have to worry about reading them in, verifying the signatures, and getting the bits of data in them (most commonly the public key, and the information about the user of that key). An X.509v3 certificate can contain a literally infinite number of items related to all kinds of things. Botan doesn't support a lot of them, because nobody uses them and they're an impossible mess to work with. This section only documents the most commonly used ones of the ones that are supported; for the rest, read \filename{x509cert.h} and \filename{asn1\_obj.h} (which has the definitions of various common ASN.1 constructs used in X.509). \subsection{So what's in an X.509 certificate?} Obviously, you want to be able to get the public key. This is achieved by calling the member function \function{subject\_public\_key}, which will return a \type{Public\_Key*}. As to what to do with this, read about \function{load\_key} in the section ``Importing and Exporting PK Keys''. In the general case, this could be any kind of public key, though 99\% of the time it will be an RSA key. However, Diffie-Hellman and DSA keys are also supported, so be careful about how you treat this. It is also a wise idea to examine the value returned by \function{constraints}, to see what uses the public key is approved for. The second major piece of information you'll want is the name/email/etc of the person to whom this certificate is assigned. Here is where things get a little nasty. X.509v3 has two (well, mostly just two $\ldots$) different places where you can stick information about the user: the \emph{subject} field, and in an extension called \emph{subjectAlternativeName}. The \emph{subject} field is supposed to only included the following information: country, organization, an organizational sub-unit name, and a so-called common name. The common name is usually the name of the person, or it could be a title associated with a position of some sort in the organization. It may also include fields for state/province and locality. What a locality is, nobody knows, but it's usually given as a city name. Botan doesn't currently support any of the Unicode variants used in ASN.1 (UTF-8, UCS-2, and UCS-4), any of which could be used for the fields in the DN. This could be problematic, particularly in Asia and other areas where non-ASCII characters are needed for most names. The UTF-8 and UCS-2 string types \emph{are} accepted (in fact, UTF-8 is used when encoding much of the time), but if any of the characters included in the string are not in ISO 8859-1 (\ie 0 \ldots 255), an exception will get thrown. Currently the \type{ASN1\_String} type holds its data as ISO 8859-1 internally (regardless of local character set); this would have to be changed to hold UCS-2 or UCS-4 in order to support Unicode (also, many interfaces in the X.509 code would have to accept or return a \type{std::wstring} instead of a \type{std::string}). Like the distinguished names, subject alternative names can contain a lot of things that Botan will flat out ignore (most of which you would likely never want to use). However, there are three very useful pieces of information that this extension might hold: an email address (``person@site1.com''), a DNS name (``somehost.site2.com''), or a URI (``http://www.site3.com''). So, how to get the information? Call \function{subject\_info} with the name of the piece of information you want, and it will return a \type{std::string} that is either empty (signifying that the certificate doesn't have this information), or has the information requested. There are several names for each possible item, but the most easily readable ones are: ``Name'', ``Country'', ``Organization'', ``Organizational Unit'', ``Locality'', ``State'', ``RFC822'', ``URI'', and ``DNS''. These values are returned as a \type{std::string}. You can also get information about the issuer of the certificate in the same way, using \function{issuer\_info}. \subsubsection{X.509v3 Extensions} X.509v3 specifies a large number of possible extensions. Botan supports some, but by no means all of them. This section lists which ones are supported, and notes areas where there may be problems with the handling. \begin{list}{$\cdot$} \item Key Usage and Extended Key Usage: No problems known. \item \item Basic Constraints: No problems known. The default for a v1/v2 certificate is assume it's a CA if and only if the option ``x509/default\_to\_ca'' is set. A v3 certificate is marked as a CA if (and only if) the basic constraints extension is present and set for a CA cert. \item Subject Alternative Names: Only the ``rfc822Name'', ``dNSName'', and ``uniformResourceIdentifier'' fields will be stored; all others are ignored. \item Issuer Alternative Names: Same restrictions as the Subject Alternative Names extension. New certificates generated by Botan never include the issuer alternative name. \item Authority Key Identifier: Only the version using KeyIdentifier is supported. If the GeneralNames version is used and the extension is critical, an exception is thrown. If both the KeyIdentifier and GeneralNames versions are present, then the KeyIdentifier will be used, and the GeneralNames ignored. \item Subject Key Identifier: No problems known. \end{list} \subsubsection{Revocation Lists} It will occasionally happen that a certificate must be revoked before its expiration date. Examples of this happening include the private key being compromised, or the user to which it has been assigned leaving an organization. Certificate revocation lists are an answer to this problem (though online certificate validation techniques are starting to become somewhat more popular). Every once in a while the CA will release a new CRL, listing all certificates that have been revoked. Also included is various pieces of information like what time a particular certificate was revoked, and for what reason. In most systems, it is wise to support some form of certificate revocation, and CRLs handle this easily. For most users, processing a CRL is quite easy. All you have to do is call the constructor, which will take a filename (or a \type{DataSource\&}). The CRLs can either be in raw BER/DER, or in PEM format; the constructor will figure out which format without any extra information. For example: \begin{verbatim} X509_CRL crl1("crl1.der"); DataSource_Stream in("crl2.pem"); X509_CRL crl2(in); \end{verbatim} After that, pass the \type{X509\_CRL} object to a \type{X509\_Store} object with \type{X509\_Code} \function{add\_crl}(\type{X509\_CRL}), and all future verifications will take into account the certificates listed, assuming \function{add\_crl} returns \type{VERIFIED}. If it doesn't return \type{VERIFIED}, then the return value is an error code signifying that the CRL could not be processed due to some problem (which could range from the issuing certificate not being found, to the CRL having some format problem). For more about the \type{X509\_Store} API, read the section later in this chapter. \subsection{Reading Certificates} \type{X509\_Certificate} has two constructors, each of which takes a source of data; a filename to read, and a \type{DataSource\&}. \subsection{Storing and Using Certificates} If you read a certificate, you probably want to verify the signature on it. However, consider that to do so, we may have to verify the signature on the certificate that we used to verify the first certificate, and on and on until we hit the top of the certificate tree somewhere. It would be a might huge pain to have to handle all of that manually in every application, so there is something that does it for you: \type{X509\_Store}. The basic operations are: put certificates and CRLs into it, search for certificates, and attempt to verify certificates. That's about it. In the future, there will be support for online retrieval of certificates and CRLs (\eg with the HTTP cert-store interface currently under consideration by PKIX). \subsubsection{Adding Certificates} You can add new certificates to a certificate store using any of these functions: \function{add\_cert}(\type{const X509\_Certificate\&} \arg{cert}, \type{bool} \arg{trusted} \type{= false}) \function{add\_certs}(\type{DataSource\&} \arg{source}) \function{add\_trusted\_certs}(\type{DataSource\&} \arg{source}) The versions that take a \type{DataSource\&} will add all the certificates that it can find in that source. All of them add the cert(s) to the store. The 'trusted' certificates are the ones that you have some reason to trust are genuine. For example, say your application is working with certificates that are owned by employees of some company, and all of their certificates are signed by the company CA, whose certificate is in turned signed by a commercial root CA. What you would then do is include the certificate of the commercial CA with your application, and read it in as a trusted certificate. From there, you could verify the company CA's certificate, and then use that to verify the end user's certificates. Only self-signed certificates may be considered trusted. \subsubsection{Adding CRLs} \type{X509\_Code} \function{add\_crl}(\type{const X509\_CRL\&} \arg{crl}); This will process the CRL and mark the revoked certificates. This will also work if a revoked certificate is added to the store sometime after the CRL is processed. The function can return an error code (listed later), or will return \type{VERIFIED} if everything completed successfully. \subsubsection{Storing Certificates} You can output a set of certificates by calling \function{PEM\_encode}, which will return a \type{std::string} containing each of the certificates in the store, PEM encoded and concatenated. This simple format can easily be read by both Botan and other libraries/applications. \subsubsection{Searching for Certificates} You can find certificates in the store with a series of functions contained in the \function{X509\_Store\_Search} namespace: \begin{verbatim} namespace X509_Store_Search { std::vector by_email(const X509_Store& store, const std::string& email_addr); std::vector by_name(const X509_Store& store, const std::string& name); std::vector by_dns(const X509_Store&, const std::string& dns_name); } \end{verbatim} These functions will return a (possibly empty) vector of certificates from \arg{store} matching your search criteria. The email address and DNS name searches are case-insensitive but are sensitive to extra whitespace and so on. The name search will do case-insensitive substring matching, so, for example, calling \function{X509\_Store\_Search::by\_name}(\arg{your\_store}, ``dob'') will return certificates for ``J.R. 'Bob' Dobbs'' and ``H. Dobbertin'', assuming both of those certificates are in \arg{your\_store}. You could then display the results to a user, and allow them to select the appropriate one. Searching using an email address as the key is usually more effective than the name, since email addresses are rarely shared. \subsubsection{Certificate Stores} An object of type \type{Certificate\_Store} is a generalized interface to an external source for certificates (and CRLs). Examples of such a store would be one that looked up the certificates in a SQL database, or by contacting a CGI script running on a HTTP server. There are currently three mechanisms for looking up a certificate, and one for retrieving CRLs. By default, most of these mechanisms will return an empty \type{std::vector} of \type{X509\_Certificate}. This storage mechanism is \emph{only} queried when doing certificate validation: it allows you to distribute only the root key with an application, and let some online method handle getting all the other certificates that are needed to validate an end entity certificate. In particular, the search routines will not attempt to access the external database. The three certificate lookup methods are \function{by\_SKID} (Subject Key Identifier), \function{by\_name} (the CommonName DN entry), and \function{by\_email} (stored in either the distinguished name, or in a subjectAlternativeName extension). The name and email versions take a \type{std::string}, while the SKID version takes a \type{SecureVector} containing the subject key identifier in raw binary. You can choose not to implement \function{by\_name} or \function{by\_email}, but \function{by\_SKID} is mandatory to implement, and, currently, is the only version that is used by \type{X509\_Store}. Finally, there is a method for finding CRLs, called \function{get\_crls\_for}, that takes an \type{X509\_Certificate} object, and returns a \type{std::vector} of \type{X509\_CRL}. While normally there will be only one CRL, the use of the vector makes it easy to return no CRLs (\eg, if the certificate store doesn't support retrieving them), or return multiple ones (for example, if the certificate store can't determine precisely which key was used to sign the certificate). Implementing the function is optional, and by default will return no CRLs. If it is available, it will be used by \type{X509\_CRL}. As for using such a store, you have to tell \type{X509\_Store} about it, by calling the \type{X509\_Store} member function \function{add\_new\_certstore}(\type{Certificate\_Store}* \arg{new\_store}) The argument, \arg{new\_store}, will be deleted by \type{X509\_Store}'s destructor, so make sure to allocate it with \function{new}. \subsubsection{Verifying Certificates} There is a single function in \type{X509\_Store} related to verifying a certificate: \type{X509\_Code} \function{validate\_cert}(\type{const X509\_Certificate\&} \arg{cert}, \type{Cert\_Usage} \arg{usage} = \type{ANY}) This function will return \type{VERIFIED} if the certificate can safely be considered valid for the usage(s) described by \arg{usage}, and an error code if it is not. Naturally, things are a bit more complicated than that. The enum \type{Cert\_Usage} is defined inside the \type{X509\_Store} class, it (currently) can take on any of the values \type{ANY} (any usage is OK), \type{TLS\_SERVER} (for SSL/TLS server authentication), \type{TLS\_CLIENT} (for SSL/TLS client authentication), \type{CODE\_SIGNING}, \type{EMAIL\_PROTECTION} (email encryption, usually this means S/MIME), \type{TIME\_STAMPING} (in theory any time stamp application, usually IETF PKIX's Time Stamp Protocol), or \type{CRL\_SIGNING}. Note that Microsoft's code signing system, certainly the most widely used, uses a completely different (and mostly undocumented) method for marking certificates for code signing. First, how does it know if a certificate is valid? A certificate is valid if both of the following hold: a) the signature in the certificate can be verified using the public key in the issuer's certificate, and b) the issuer's certificate is a valid CA certificate. Note that this definition is recursive. We get out of this by ``bottoming out'' when we reach a certificate that we consider trusted. In general this will either be a commercial root CA, or an organization or application specific CA. There are a few other restrictions (validity periods, key usage restrictions, etc), but the above summarizes the major points of the validation algorithm. In theory, Botan implements the certificate path validation algorithm given in RFC 2459, but in practice it does not (yet), because we don't support the X.509v3 policy or name constraint extensions. Possible values for \arg{usage} are \type{TLS\_SERVER}, \type{TLS\_CLIENT}, \type{CODE\_SIGNING}, \type{EMAIL\_PROTECTION}, \type{CRL\_SIGNING}, and \type{TIME\_STAMPING}, and \type{ANY}. The default \type{ANY} does not mean valid for any use, it means ``is valid for some usage''. This is usually what you want; requiring that a random certificate support a particular usage will likely result in a lot of failures, unless your application is very careful to always issue certificates with the proper extensions, and you never use certificates generated by other apps. Return values for \function{validate\_cert} (and \function{add\_crl}) include: \begin{list}{$\cdot$} \item VERIFIED: The certificate is valid for the specified use. \item \item INVALID\_USAGE: The certificate cannot be used for the specified use. \item CANNOT\_ESTABLISH\_TRUST: The root certificate was not marked as trusted. \item CERT\_CHAIN\_TOO\_LONG: The certificate chain exceeded the length allowed by a basicConstraints extension. \item SIGNATURE\_ERROR: An invalid signature was found \item POLICY\_ERROR: Some problem with the certificate policies was found. \item CERT\_FORMAT\_ERROR: Some format problem was found in a certificate. \item CERT\_ISSUER\_NOT\_FOUND: The issuer of a certificate could not be found. \item CERT\_NOT\_YET\_VALID: The certificate is not yet valid. \item CERT\_HAS\_EXPIRED: The certificate has expired. \item CERT\_IS\_REVOKED: The certificate has been revoked. \item CRL\_FORMAT\_ERROR: Some format problem was found in a CRL. \item CRL\_ISSUER\_NOT\_FOUND: The issuer of a CRL could not be found. \item CRL\_NOT\_YET\_VALID: The CRL is not yet valid. \item CRL\_HAS\_EXPIRED: The CRL has expired. \item CA\_CERT\_CANNOT\_SIGN: The CA certificate found does not have an contain a public key that allows signature verification. \item CA\_CERT\_NOT\_FOR\_CERT\_ISSUER: The CA cert found is not allowed to issue certificates. \item CA\_CERT\_NOT\_FOR\_CRL\_ISSUER: The CA cert found is not allowed to issue CRLs. \item UNKNOWN\_X509\_ERROR: Some other error occurred. \end{list} \subsection{Certificate Authorities} Setting up a CA for X.509 certificates is perhaps the easiest thing to do related to X.509. A CA is represented by the type \type{X509\_CA}, which can be found in \filename{x509\_ca.h}. A CA always needs its own certificate, which can either be a self-signed certificate (see below on how to create one) or one issued by another CA (see the section on PKCS \#10 requests). Creating a CA object is done by the following constructor: \begin{verbatim} X509_CA(const X509_Certificate& cert, const Private_Key& key); \end{verbatim} The private key is the private key corresponding to the public key in the CA's certificate. Requests for new certificates are supplied to a CA in the form on PKCS \#10 certificate requests (called a \type{PKCS10\_Request} object in Botan). These are decoded in a similar manner to certificates/CRLs/etc. A request is vetted by humans (who somehow verify that the name in the request corresponds to the name of the entity who requested it), and then signed by a CA key, generating a new certificate. \begin{verbatim} X509_Certificate sign_request(const PKCS10_Request&) const; \end{verbatim} \subsubsection{Generating CRLs} As mentioned previously, the ability to process CRLs is highly important in many PKI systems. In fact, according to strict X.509 rules, you must not validate any certificate if the appropriate CRLs are not available (though hardly any systems are that strict). In any case, a CA should have a valid CRL available at all times. Of course, you might be wondering what to do if no certificates have been revoked. Never fear; empty CRLs, which revoke nothing at all, can be issued. To generate a new, empty CRL, just call \type{X509\_CRL} \function{X509\_CA::new\_crl}(\type{u32bit}~\arg{seconds}~=~0)~--~it will create a new, empty, CRL. If \arg{seconds} is the default 0, then the normal default CRL next update time (the value of the ``x509/crl/next\_update'') will be used. If not, then \arg{seconds} specifies how long (in seconds) it will be until the CRL's next update time (after this time, most clients will reject the CRL as too old). On the other hand, you may have issued a CRL before. In that case, you will want to issue a new CRL that contains all previously revoked certificates, along with any new ones. This is done by calling the \type{X509\_CA} member function \function{update\_crl}(\type{X509\_CRL}~\arg{old\_crl}, \type{std::vector}~\arg{new\_revoked}, \type{u32bit}~\arg{seconds}~=~0), where \type{X509\_CRL} is the last CRL this CA issued, and \arg{new\_revoked} is a list of any newly revoked certificates. The function returns a new \type{X509\_CRL} to make available for clients. The semantics for the \arg{seconds} argument is the same as \function{new\_crl}. The \type{CRL\_Entry} type is a structure that contains, at a minimum, the serial number of the revoked certificate. As serial numbers are never repeated, the pairing of an issuer and a serial number (should) distinctly identify any certificate. In this case, we represent the serial number as a \type{SecureVector} called \arg{serial}. There are two additional (optional) values, an enumeration called \type{CRL\_Code} that specifies the reason for revocation (\arg{reason}), and an object that represents the time that the certificate became invalid (if this information is known). If you wish to remove an old entry from the CRL, insert a new entry for the same cert, with a \arg{reason} code of \type{DELETE\_CRL\_ENTRY}. For example, if a revoked certificate has expired 'normally', there is no reason to continue to explicitly revoke it, since clients will reject the cert as expired in any case. \subsubsection{Self-Signed Certificates} Generating a new self-signed certificate can often be useful, for example when setting up a new root CA, or for use in email applications. The library provides a utility function for this: \begin{verbatim} namespace X509 { X509_Certificate create_self_signed_cert(const X509_Cert_Options& opts, const Private_Key& key); } \end{verbatim} Where \arg{key} is obviously the private key you wish to use (the public key, used in the certificate itself, is extracted from the private key), and \arg{opts} is an structure that has various bits of information that will be used in creating the certificate (this structure, and its use, is discussed below). This function is found in the header \filename{x509self.h}. There is an example of using this function in the \filename{self\_sig} example. \subsubsection{Creating PKCS \#10 Requests} Also in \filename{x509self.h}, there is a function for generating new PKCS \#10 certificate requests. \begin{verbatim} namespace X509 { PKCS10_Request create_cert_req(const X509_Cert_Options&, const Private_Key&); } \end{verbatim} This function acts quite similarly to \function{create\_self\_signed\_cert}, except it instead returns a PKCS \#10 certificate request. After creating it, one would typically transmit it to a CA, who signs it and returns a freshly minted X.509 certificate. There is an example of using this function in the \filename{pkcs10} example. \subsubsection{Certificate Options} What is this \type{X509\_Cert\_Options} thing we've been passing around? It's a class representing a bunch of information that will end up being stored into the certificate. This information comes in 3 major flavors: information about the subject (CA or end-user), the validity period of the certificate, and restrictions on the usage of the certificate. First and foremost is a number of \type{std::string} members, which contains various bits of information about the user: \arg{common\_name}, \arg{serial\_number}, \arg{country}, \arg{organization}, \arg{org\_unit}, \arg{locality}, \arg{state}, \arg{email}, \arg{dns\_name}, and \arg{uri}. As many of these as possible should be filled it (especially an email address), though the only required ones are \arg{common\_name} and \arg{country}. There is another value that is only useful when creating a PKCS \#10 request, which is called \arg{challenge}. This is a challenge password, which you can later use to request certificate revocation (\emph{if} the CA supports doing revocations in this manner). Then there is the validity period; these are set with \function{not\_before} and \function{not\_after}. Both of these functions also take a \type{std::string}, which specifies when the certificate should start being valid, and when it should stop being valid. If you don't set the starting validity period, it will automatically choose the current time. If you don't set the ending time, it will choose the starting time plus a default time period. The arguments to these functions specify the time in the following format: ``2002/11/27 1:50:14''. The time is in 24-hour format, and the date is encoded as year/month/day. The date must be specified, but you can omit the time or trailing parts of it, for example ``2002/11/27 1:50'' or ``2002/11/27''. Lastly, you can set constraints on a key. The one you're mostly likely to want to use is to create (or request) a CA certificate, which can be done by calling the member function \function{CA\_key}. This should only be used when needed. Other constraints can be set by calling the member functions \function{add\_constraints} and \function{add\_ex\_constraints}. The first takes a \type{Key\_Constraints} value, and replaces any previously set value. If no value is set, then the certificate key is marked as being valid for any usage. You can set it to any of the following (for more than one usage, OR them together): \type{DIGITAL\_SIGNATURE}, \type{NON\_REPUDIATION}, \type{KEY\_ENCIPHERMENT}, \type{DATA\_ENCIPHERMENT}, \type{KEY\_AGREEMENT}, \type{KEY\_CERT\_SIGN}, \type{CRL\_SIGN}, \type{ENCIPHER\_ONLY}, \type{DECIPHER\_ONLY}. Many of these have quite special semantics, so you should either consult the appropriate standards document (such as RFC 3280), or just not call \function{add\_constraints}, in which case the appropriate values will be chosen for you. The second function, \function{add\_ex\_constraints}, allows you to specify an OID that has some meaning with regards to restricting the key to particular usages. You can, if you wish, specify any OID you like, but there is a set of standard ones that other applications will be able to understand. These are the ones specified by the PKIX standard, and are named ``PKIX.ServerAuth'' (for TLS server authentication), ``PKIX.ClientAuth'' (for TLS client authentication), ``PKIX.CodeSigning'', ``PKIX.EmailProtection'' (most likely for use with S/MIME), ``PKIX.IPsecUser'', ``PKIX.IPsecTunnel'', ``PKIX.IPsecEndSystem'', and ``PKIX.TimeStamping''. You can call \function{add\_ex\_constraints} any number of times~--~each new OID will be added to the list to include in the certificate. \section{The Low-Level Interface} Botan has two different interfaces. The one documented in this section is meant more for implementing higher-level types (see the section on filters, earlier in this manual) than for use by applications. Using it safely requires a solid knowledge of encryption techniques and best practices, so unless you know, for example, what CBC mode and nonces are, and why PKCS \#1 padding is important, you should avoid this interface in favor of something working at a higher level (such as the CMS interface). \subsection{Basic Algorithm Abilities} There are a small handful of functions implemented by most of Botan's algorithm objects. Among these are: \noindent \type{std::string} \function{name}(): Returns a human-readable string of the name of this algorithm. Examples of names returned are ``Blowfish'' and ``HMAC(MD5)''. You can turn names back into algorithm objects using the functions in \filename{lookup.h}. \noindent \type{void} \function{clear}(): Clear out the algorithm's internal state. A block cipher object will ``forget'' its key, a hash function will ``forget'' any data put into it, etc. The object will look and behave as it did when you initially allocated it. \noindent \function{clone}(): This function is central to Botan's name-based interface. The \function{clone} has many different return types, such as \type{BlockCipher*} and \type{HashFunction*}, depending on what kind of object it is called on. Note that unlike Java's clone, this returns a new object in a ``pristine'' state; that is, operations done on the initial object before calling \function{clone} do not affect the initial state of the new clone. Cloned objects can (and should) be deallocated with the C++ \texttt{delete} operator. \subsection{Keys and IVs} Both symmetric keys and initialization values can be considered byte (or octet) strings. These are represented by the classes \type{SymmetricKey} and \type{InitializationVector}, which are subclasses of \type{OctetString}. Since often it's hard to distinguish between a key and IV, many things (such as key derivation mechanisms) return \type{OctetString} instead of \type{SymmetricKey} to allow its use as a key or an IV. \noindent \function{OctetString}(\type{u32bit} \arg{length}): This constructor creates a new random key of size \arg{length}. \noindent \function{OctetString}(\type{std::string} \arg{str}): The argument \arg{str} is assumed to be a hex string; it is converted to binary and stored. Whitespace is ignored. \noindent \function{OctetString}(\type{const byte} \arg{input}[], \type{u32bit} \arg{length}): This constructor copies its input. \subsection{Symmetrically Keyed Algorithms} Block ciphers, stream ciphers, and MACs are all keyed operations; to be useful, they have to be set to use a particular key, which is a randomly chosen string of bits of a specified length. The length required by any particular algorithm may vary, depending on both the algorithm specification and the implementation. You can query any botan object to find out what key length(s) it supports. To make this similarity in terms of keying explicit, all algorithms of those types are derived from the \type{SymmetricAlgorithm} base class. This type has three functions: \noindent \type{void} \function{set\_key}(\type{const byte} \arg{key}[], \type{u32bit} \arg{length}): Most algorithms only accept keys of certain lengths. If you attempt to call \function{set\_key} with a key length that is not supported, the exception \type{Invalid\_Key\_Length} will be thrown. There is also another version of \function{set\_key} that takes a \type{SymmetricKey} as an argument. \noindent \type{bool} \function{valid\_keylength}(\type{u32bit} \arg{length}) const: This function returns true if a key of the given length will be accepted by the cipher. There are also three constant data members of every \type{SymmetricAlgorithm} object, which specify what limits there are on keys which that object can accept: MAXIMUM\_KEYLENGTH: The maximum length of a key. Usually, this is at most 32 (256 bits), even if the algorithm supports more. In a few rare cases larger keys will be supported. MINIMUM\_KEYLENGTH: The minimum length of a key. This is at least 1. KEYLENGTH\_MULTIPLE: The length of the key must be a multiple of this value. In all cases, \function{set\_key} must be called on an object before any data processing (encryption, decryption, etc) is done by that object. If this is not done, the results are undefined -- that is to say, Botan reserves the right in this situation to do anything from printing a nasty, insulting message on the screen to dumping core. \subsection{Block Ciphers} Block ciphers implement the interface \type{BlockCipher}, found in \filename{base.h}, as well as the \type{SymmetricAlgorithm} interface. \noindent \type{void} \function{encrypt}(\type{const byte} \arg{in}[BLOCK\_SIZE], \type{byte} \arg{out}[BLOCK\_SIZE]) const \noindent \type{void} \function{encrypt}(\type{byte} \arg{block}[BLOCK\_SIZE]) const These functions apply the block cipher transformation to \arg{in} and place the result in \arg{out}, or encrypts \arg{block} in place (\arg{in} may be the same as \arg{out}). BLOCK\_SIZE is a constant member of each class, which specifies how much data a block cipher can process at one time. Note that BLOCK\_SIZE is not a static class member, meaning you can (given a \type{BlockCipher*} named \arg{cipher}), call \verb|cipher->BLOCK_SIZE| to get the block size of that particular object. \type{BlockCipher}s have similar functions \function{decrypt}, which perform the inverse operation. \begin{verbatim} AES_128 cipher; SymmetricKey key(cipher.MAXIMUM_KEYLENGTH); // randomly created cipher.set_key(key); byte in[16] = { /* secrets */ }; byte out[16]; cipher.encrypt(in, out); \end{verbatim} \subsection{Stream Ciphers} Stream ciphers are somewhat different from block ciphers, in that encrypting data results in changing the internal state of the cipher. Also, you may encrypt any length of data in one go (in byte amounts). \noindent \type{void} \function{encrypt}(\type{const byte} \arg{in}[], \type{byte} \arg{out}[], \type{u32bit} \arg{length}) \noindent \type{void} \function{encrypt}(\type{byte} \arg{data}[], \type{u32bit} \arg{length}): These functions encrypt the arbitrary length (well, less than 4 gigabyte long) string \arg{in} and place it into \arg{out}, or encrypts it in place in \arg{data}. The \function{decrypt} functions look just like \function{encrypt}. Stream ciphers implement the \type{SymmetricAlgorithm} interface. Some stream ciphers support random access to any point in their cipher stream. For such ciphers, calling \type{void} \function{seek}(\type{u32bit} \arg{byte}) will change the cipher's state so that it is as if the cipher had been keyed as normal, then encrypted \arg{byte} -- 1 bytes of data (so the next byte in the cipher stream is byte number \arg{byte}). \subsection{Hash Functions / Message Authentication Codes} Hash functions take their input without producing any output, only producing anything when all input has already taken place. MACs are very similar, but are additionally keyed. Both of these are derived from the base class \type{BufferedComputation}, which has the following functions. \noindent \type{void} \function{update}(\type{const byte} \arg{input}[], \type{u32bit} \arg{length}) \noindent \type{void} \function{update}(\type{byte} \arg{input}) \noindent \type{void} \function{update}(\type{const std::string \&} \arg{input}) Updates the hash/mac calculation with \arg{input}. \noindent \type{void} \function{final}(\type{byte} \arg{out}[OUTPUT\_LENGTH]) \noindent \type{SecureVector} \function{final}(): Complete the hash/MAC calculation and place the result into \arg{out}. OUTPUT\_LENGTH is a public constant in each object that gives the length of the hash in bytes. After you call \function{final}, the hash function is reset to its initial state, so it may be reused immediately. The second method of using final is to call it with no arguments at all, as shown in the second prototype. It will return the hash/mac value in a memory buffer, which will have size OUTPUT\_LENGTH. There is also a pair of functions called \function{process}. They are a combination of a single \function{update}, and \function{final}. Both versions return the final value, rather than placing it an array. Calling \function{process} with a single byte value isn't available, mostly because it would rarely be useful. A MAC can be viewed (in most cases) as a keyed hash function, so classes that are derived from \type{MessageAuthenticationCode} have \function{update} and \function{final} classes just like a \type{HashFunction} (and like a \type{HashFunction}, after \function{final} is called, it can be used to make a new MAC right away; the key is kept around). A MAC has the \type{SymmetricAlgorithm} interface in addition to the \type{BufferedComputation} interface. \section{Random Number Generators} The random number generators provided in Botan are meant for creating keys, IVs, padding, nonces, and anything else that requires 'random' data. It is important to remember that the output of these classes will vary, even if they are supplied with ethe same seed (\ie, two \type{Randpool} objects with similar initial states will not produce the same output, because the value of high resolution timers is added to the state at various points). To ensure good quality output, a PRNG needs to be seeded with truly random data (such as that produced by a hardware RNG). Typically, you will use an \type{EntropySource} (see below). To add entropy to a PRNG, you can use \type{void} \function{add\_entropy}(\type{const byte} \arg{data}[], \type{u32bit} \arg{length}) or (better), use the \type{EntropySource} interface. Once a PRNG has been initialized, you can get a single byte of random data by calling \type{byte} \function{random()}, or get a large block by calling \type{void} \function{randomize}(\type{byte} \arg{data}[], \type{u32bit} \arg{length}), which will put random bytes into each member of the array from indexes 0 $\ldots$ \arg{length} -- 1. You can avoid all the problems inherent in seeding the PRNG by using the globally shared PRNG, described later in this section. \subsection{Randpool} \type{Randpool} is the primary PRNG within Botan. In recent versions all uses of it have been wrapped by an implementation of the X9.31 PRNG (see below). If for some reason you should have cause to create a PRNG instead of using the ``global'' one owned by the library, it would be wise to consider the same on the grounds of general caution; while \type{Randpool} is designed with known attacks and PRNG weaknesses in mind, it is not an standard/official PRNG. The remainder of this section is a (fairly technical, though high-level) description of the algorithms used in this PRNG. Unless you have a specific interest in this subject, the rest of this section might prove somewhat uninteresting. \type{Randpool} has an internal state called pool, which is 512 bytes long. This is where entropy is mixed into and extracted from. There is also a small output buffer (called buffer), which holds the data which has already been generated but has just not been output yet. It is based around a MAC and a block cipher (which are currently HMAC(SHA-256) and AES-256). Where a specific size is mentioned, it should be taken as a multiple of the cipher's block size. For example, if a 256-bit block cipher were used instead of AES, all the sizes internally would double. Every time some new output is needed, we compute the MAC of a counter and a high resolution timer. The resulting MAC is XORed into the output buffer (wrapping as needed), and the output buffer is then encrypted with AES, producing 16 bytes of output. After 8 blocks (or 128 bytes) have been produced, we mix the pool. To do this, we first rekey both the MAC and the cipher; the new MAC key is the MAC of the current pool under the old MAC key, while the new cipher key is the MAC of the current pool under the just-chosen MAC key. We then encrypt the entire pool in CBC mode, using the current (unused) output buffer as the IV. We then generate a new output buffer, using the mechanism described in the previous paragraph. To add randomness to the PRNG, we compute the MAC of the input and XOR the output into the start of the pool. Then we remix the pool and produce a new output buffer. The initial MAC operation should make it very hard for chosen inputs to harm the security of \type{Randpool}, and as HMAC should be able to hold roughly 256 bits of state, it is unlikely that we are wasting much input entropy (or, if we are, it doesn't matter, because we have a very abundant supply). \subsection{ANSI X9.31} \type{ANSI\_X931\_PRNG} is the standard issue X9.31 Appendix A.2.4 PRNG, though using AES-256 instead of 3DES as the block cipher. This PRNG implementation has been checked against official X9.31 test vectors. Internally, the PRNG holds a pointer to another PRNG (typically Randpool). This internal PRNG generates the key and seed used by the X9.31 algorithm, as well as the date/time vectors. Each time an X9.31 PRNG object receives entropy, it passes it along to the PRNG it is holding, and then pulls out some random bits to generate a new key and seed. This PRNG considers itself seeded as soon as the internal PRNG is seeded. As of version 1.4.7, the X9.31 PRNG is by default used for all random number generation. \subsection{Entropy Sources} An \type{EntropySource} is an abstract representation of some method of gather ``real'' entropy. This tends to be very system dependent. The \emph{only} way you should use an \type{EntropySource} is to pass it to a PRNG that will extract entropy from it -- never use the output directly for any kind of key or nonce generation! \type{EntropySource} has a pair of functions for getting entropy from some external source, called \function{fast\_poll} and \function{slow\_poll}. These pass a buffer of bytes to be written; the functions then return how many bytes of entropy were gathered. \type{EntropySource}s are usually used to seed the global PRNG using the functions found in the \namespace{Global\_RNG} namespace. Note for writers of \type{EntropySource}s: it isn't necessary to use any kind of cryptographic hash on your output. The data produced by an EntropySource is only used by an application after it has been hashed by the \type{RandomNumberGenerator} that asked for the entropy, thus any hashing you do will be wasteful of both CPU cycles and entropy. \section{User Interfaces} Botan has recently changed some infrastructure to better accommodate more complex user interfaces, in particular ones that are based on event loops. Primary among these was the fact that when doing something like loading a PKCS \#8 encoded private key, a passphrase might be needed, but then again it might not (a PKCS \#8 key doesn't have to be encrypted). Asking for a passphrase to decrypt an unencrypted key is rather pointless. Not only that, but the way to handle the user typing the wrong passphrase was complicated, undocumented, and inefficient. So now Botan has an object called \type{UI}, which provides a simple interface for the aspects of user interaction the library has to be concerned with. Currently, this means getting a passphrase from the user, and that's it (\type{UI} will probably be extended in the future to support other operations as they are needed). The base \type{UI} class is very stupid, because the library can't directly assume anything about the environment that it's running under (for example, if there will be someone sitting at the terminal, if the application is even \emph{attached} to a terminal, and so on). But since you can subclass \type{UI} to use whatever method happens to be appropriate for your application, this isn't a big deal. \begin{verbatim} std::string get_passphrase(const std::string& what, const std::string& source, UI_Result& result) const; \end{verbatim} The \arg{what} argument specifies what the passphrase is needed for (for example, PKCS \#8 key loading passes \arg{what} as ``PKCS \#8 private key''). This lets you provide the user with some indication of \emph{why} your application is asking for a passphrase; feel free to pass the string through \function{gettext(3)} or moral equivalent for i18n purposes. Similarly, \arg{source} specifies where the data in question came from, if available (for example, a file name). If the source is not available for whatever reason, then \arg{source} will be an empty string; be sure to account for this possibility when writing a \type{UI} subclass. The function returns the passphrase as the return value, and a status code in \arg{result} (either \type{OK} or \type{CANCEL\_ACTION}). If \type{CANCEL\_ACTION} is returned in \arg{result}, then the return value will be ignored, and the caller will take whatever action is necessary (typically, throwing an exception stating that the passphrase couldn't be determined). In the specific case of PKCS \#8 key decryption, a \type{Decoding\_Error} exception will be thrown; your UI should assume this can happen, and provide appropriate error handling (such as putting up a dialog box informing the user of the situation, and canceling the operation in progress). There is an example \type{UI} that uses GTK+ available on the web site. The \type{GTK\_UI} code is cleanly separated from the rest of the example, so if you happen to be using GTK+, you can copy (and/or adapt) that code for your application. If you write a \type{UI} object for another windowing system (Win32, Qt, wxWidgets, FOX, etc), and would like to make it available to users in general (ideally under a permissive license such as public domain or MIT/BSD), feel free to send in a copy. \section{Botan's Modules} Botan comes with a variety of modules that can be compiled into the system. These will not be available on all installations of the library, but you can check for their availability based on whether or not certain macros are defined. \subsection{Pipe I/O for Unix File Descriptors} This is a minor feature, but it comes in handy sometimes. In all installations of the library, Botan's \type{Pipe} object overloads the \keyword{<<} and \keyword{>>} operators for C++ iostream objects, which is usually more than sufficient for doing I/O. However, there are cases where the iostream hierarchy does not map well to local 'file types', so there is also the ability to do I/O directly with Unix file descriptors. This is most useful when you want to read from or write to something like a TCP or Unix-domain socket, or a pipe, since for simple file access it's usually easier to just use C++'s file streams. If \macro{BOTAN\_EXT\_PIPE\_UNIXFD\_IO} is defined, then you can use the overloaded I/O operators with Unix file descriptors. For an example of this, check out the \filename{hash\_fd} example, included in the Botan distribution. \subsection{Entropy Sources} All of these are used by the \function{Global\_RNG::seed} function if they are available. Since this function is called by the \type{LibraryInitializer} class when it is created, it is rare that you will need to deal with any of these classes directly. Even in the case of a long-running server that needs to renew its entropy poll, it is easier to call \function{Global\_RNG::seed} (see the section entitled ``The Global PRNG'' for more details). \noindent \type{EGD\_EntropySource}: Query an EGD socket. If the macro \macro{BOTAN\_EXT\_ENTROPY\_SRC\_EGD} is defined, it can be found in \filename{es\_egd.h}. The constructor takes a \type{std::vector} that specifies the paths to look for an EGD socket. \noindent \type{Unix\_EntropySource}: This entropy source executes programs common on Unix systems (such as \filename{uptime}, \filename{vmstat}, and \filename{df}) and adds it to a buffer. It's quite slow due to process overhead, and (roughly) 1 bit of real entropy is in each byte that is output. It is declared in \filename{es\_unix.h}, if \macro{BOTAN\_EXT\_ENTROPY\_SRC\_UNIX} is defined. If you don't have \filename{/dev/urandom} \emph{or} EGD, this is probably the thing to use. For a long-running process on Unix, keep on object of this type around and run fast polls ever few minutes. \noindent \type{FTW\_EntropySource}: Walk through a filesystem (the root to start searching is passed as a string to the constructor), reading files. This tends to only be useful on things like \filename{/proc} that have a great deal of variability over time, and even then there is only a small amount of entropy gathered: about 1 bit of entropy for every 16 bits of output (and many hundreds of bits are read in order to get that 16 bits). It is declared in \filename{es\_ftw.h}, if \macro{BOTAN\_EXT\_ENTROPY\_SRC\_FTW} is defined. Only use this as a last resort. I don't really trust it, and neither should you. \noindent \type{Win32\_CAPI\_EntropySource}: This routines gathers entropy from a Win32 CAPI module. It takes an optional \type{std::string} that will specify what type of CAPI provider to use. The CAPI RNG is usually a default software-based PRNG, but there are a few providers that may use a hardware RNG. By default it will use the first provider listed in the option ``rng/ms\_capi\_prov\_type'' that is available on the machine (currently the providers ``RSA\_FULL'', ``INTEL\_SEC'', ``FORTEZZA'', and ``RNG'' are recognized). \noindent \type{BeOS\_EntropySource}: Query system statistics using various BeOS-specific APIs. \noindent \type{Pthread\_EntropySource}: Attempt to gather entropy based on jitter between a number of threads competing for a single mutex. This entropy source is \emph{very} slow, and highly questionable in terms of security. However, it provides a worst-case fallback on systems that don't have Unix-like features, but do support POSIX threads. This module is currently unavailable due to problems on some systems. \subsection{Compressors} There are two compression algorithms supported by Botan, Zlib and Bzip2 (Gzip and Zip encoding will be supported in future releases). Only lossless compression algorithms are currently supported by Botan, because they tend to be the most useful for cryptography. However, it is very reasonable to consider supporting something like GSM speech encoding (which is lossy), for use in encrypted voice applications. You should always compress \emph{before} you encrypt, because encryption seeks to hide the redundancy that compression is supposed to try to find and remove. \subsubsection{Bzip2} To test for Bzip2, check to see if \macro{BOTAN\_EXT\_COMPRESSOR\_BZIP2} is defined. If so, you can include \filename{bzip2.h}, which will declare a pair of \type{Filter} objects: \type{Bzip2\_Compression} and \type{Bzip2\_Decompression}. You should be prepared to take an exception when using the decompressing filter, for if the input is not valid Bzip2 data, that is what you will receive. You can specify the desired level of compression to \type{Bzip2\_Compression}'s constructor as an integer between 1 and 9, 1 meaning worst compression, and 9 meaning the best. The default is to use 9, since small values take the same amount of time, just use a little less memory. The Bzip2 module was contributed by Peter J. Jones. \subsubsection{Zlib} Zlib compression works much like Bzip2 compression. The only differences in this case are that the macro is \macro{BOTAN\_EXT\_COMPRESSOR\_ZLIB}, the header you need to include is called \filename{botan/zlib.h} (remember that you shouldn't just \verb|#include |, or you'll get the regular zlib API, which is not what you want). The Botan classes for Zlib compression/decompression are called \type{Zlib\_Compression} and \type{Zlib\_Decompression}. Like Bzip2, a \type{Zlib\_Decompression} object will throw an exception if invalid (in the sense of not being in the Zlib format) data is passed into it. In the case of zlib's algorithm, a worse compression level will be faster than a very high compression ratio. For this reason, the Zlib compressor will default to using a compression level of 6. This tends to give a good trade off in terms of time spent to compression achieved. There are several factors you need to consider in order to decide if you should use a higher compression level: \begin{list}{$\cdot$} \item Better security: the less redundancy in the source text, the harder it is to attack your ciphertext. This is not too much of a concern, because with decent algorithms using sufficiently long keys, it doesn't really matter \emph{that} much (but it certainly can't hurt). \item \item Decreasing returns. Some simple experiments by the author showed minimal decreases in the size between level 6 and level 9 compression with large (1 to 3 megabyte) files. There was some difference, but it wasn't that much. \item CPU time. Level 9 zlib compression is often two to four times as slow as level 6 compression. This can make a substantial difference in the overall runtime of a program. \end{list} While the zlib compression library uses the same compression algorithm as the gzip and zip programs, the format is different. The zlib format is defined in RFC 1950. \subsubsection{Data Sources} A \type{DataSource} is a simple abstraction for a thing that stores bytes. This type is used heavily in the areas of the API related to ASN.1 encoding/decoding. The following types are \type{DataSource}s: \type{Pipe}, \type{SecureQueue}, and a couple of special purpose ones: \type{DataSource\_Memory} and \type{DataSource\_Stream}. You can create a \type{DataSource\_Memory} with an array of bytes and a length field. The object will make a copy of the data, so you don't have to worry about keeping that memory allocated. This is mostly for internal use, but if it comes in handy, feel free to use it. A \type{DataSource\_Stream} is probably more useful than the memory based one. Its constructors take either a \type{std::istream} or a \type{std::string}. If it's a stream, the data source will use the \type{istream} to satisfy read requests (this is particularly useful to use with \type{std::cin}). If the string version is used, it will attempt to open up a file with that name and read from it. \subsubsection{Data Sinks} A \type{DataSink} (in \filename{data\_snk.h}) is a \type{Filter} that takes arbitrary amounts of input, and produces no output. This means it's doing something with the data outside the realm of what \type{Filter}/\type{Pipe} can handle, for example, writing it to a file (which is what the \type{DataSink\_Stream} does). There is no need for \type{DataSink}s that write to a \type{std::string} or memory buffer, because \type{Pipe} can handle that by itself. Here's a quick example of using a \type{DataSink}, which encrypts \filename{in.txt} and sends the output to \filename{out.txt}. There is no explicit output operation; the writing of \filename{out.txt} is implicit. \begin{verbatim} DataSource_Stream in("in.txt"); Pipe pipe(new CBC_Encryption("Blowfish", "PKCS7", key, iv), new DataSink_Stream("out.txt")); pipe.process_msg(in); \end{verbatim} A real advantage of this is that even if ``in.txt'' is large, only as much memory is needed for internal I/O buffers will be used. \section{Miscellaneous} This section has documentation for anything that just didn't fit into any of the major categories. Many of them (Timers, Allocators) will rarely be used in actual application code, but others, like the PBKDF algorithms, have a wide degree of applicability. \subsection{PBKDF Algorithms} There are various procedures (usually ad-hoc) for turning a passphrase into a (mostly) arbitrary length key for a symmetric cipher. A general interface for such algorithms is presented in \filename{pbkdf.h}. The main function is \function{derive\_key}, which takes a passphrase, a salt, an iteration count, and the desired length of the output key, and returns a key of that length, deterministically produced from the passphrase and salt. If an algorithm can't produce a key of that size, it will throw an exception (most notably, PKCS \#5's PBKDF1 can only produce strings between 1 and $n$ bytes, where $n$ is the output size of the underlying hash function). The purpose of the iteration count is to make the algorithm take longer to compute the final key (reducing the speed of brute-force attacks of various kinds). Most standards recommend an iteration count of at least 10000. Currently defined PBKDF algorithms are ``PBKDF1(digest)'', ``PBKDF2(digest)'', and ``OpenPGP-S2K(digest)''; you can retrieve any of these using the \function{get\_pbkdf}, found in \filename{lookup.h}. As of this writing, ``PBKDF2(SHA-256)'' with 10000 iterations and a 16 byte salt is recommend for new applications. \subsubsection{OpenPGP S2K} There are some oddities about OpenPGP's S2K algorithms that are documented here. For one thing, it uses the iteration count in a strange manner; instead of specifying how many times to iterate the hash, it tells how many \emph{bytes} should be hashed in total (including the salt). So the exact iteration count will depend on the size of the salt (which is fixed at 8 bytes by the OpenPGP standard, though the implementation will allow any salt size) and the size of the passphrase. To get what OpenPGP calls ``Simple S2K'', set iterations to 0, and do not specify a salt. To get ``Salted S2K'', again leave the iteration count at 0, but give an 8-byte salt. ``Salted and Iterated S2K'' requires an 8-byte salt and some iteration count (this should be significantly larger than the size of the longest passphrase that might reasonably be used; somewhere from 1024 to 65536 would probably be about right). Using both a reasonably sized salt and a large iteration count is highly recommended to prevent password guessing attempts. \subsection{Password Hashing} Storing passwords for user authentication purposes in plaintext is the simplest but least secure method; when an attacker compromises the database in which the passwords are stored, they immediately gain access to all of them. Often passwords are reused among multiple services or machines, meaning once a password to a single service is known an attacker has a substantial head start on attacking other machines. The general approach is to store, instead of the password, the output of a one way function of the password. Upon receiving an authentication request, the authenticator can recompute the one way function and compare the value just computed with the one that was stored. If they match, then the authentication request succeeds. But when an attacker gains access to the database, they only have the output of the one way function, not the original password. Common hash functions such as SHA-256 are one way, but used alone they have problems for this purpose. What an attacker can do, upon gaining access to such a stored password database, is hash common dictionary words and other possible passwords, storing them in a list. Then he can search through his list; if a stored hash and an entry in his list match, then he has found the password. Even worse, this can happen \emph{offline}: an attacker can begin hashing common passwords days, months, or years before ever gaining access to the database. In addition, if two users choose the same password, the one way function output will be the same for both of them, which will be visible upon inspection of the database. There are two solutions to these problems: salting and iteration. Salting refers to including, along with the password, a randomly chosen value which perturbs the one way function. Salting can reduce the effectivness of offline dictionary generation (because for each potential password, an attacker would have to compute the one way function output for all possible salts - with a large enough salt, this can make the problem quite difficult). It also prevents the same password from producing the same output, as long as the salts do not collide. With a large salt (say 80 to 128 bits) this will be quite unlikely. Iteration refers to the general technique of forcing multiple one way function evaluations when computing the output, to slow down the operation. For instance if hashing a single password requires running SHA-256 100,000 times instead of just once, that will slow down user authentication by a factor of 100,000, but user authentication happens quite rarely, and usually there are more expensive operations that need to occur anyway (network and database I/O, etc). On the other hand, an attacker who is attempting to break a database full of stolen password hashes will be seriously inconvenienced by a factor of 100,000 slowdown; they will be able to only test at a rate of .0001\% of what they would without iterations (or, equivalently, will require 100,000 times as many zombie botnet hosts). There are many different ways of doing this password hashing operation, with common ones including Unix's crypt (which is based on DES) and OpenBSD's bcrypt (based on Blowfish). Other variants using MD5 or SHA-256 are also in use on various systems. Botan provides a technique called passhash9, in \filename{passhash9.h}, which is based on PBKDF2. Two functions are provided in this header, \function{generate\_passhash9} and \function{check\_passhash9}. The generate function takes the password to hash, a \type{RandomNumberGenerator}, and a work factor, which tells how many iterations to compute. The default work factor is 10 (which means 100,000 iterations), but any non-zero value is accepted. The check function takes a password and a passhash9 output and checks if the password is the same as the one that was used to generate the passhash9 output, returning a boolean true (same) or false (not same). An example can be found in \filename{doc/examples/passhash.cpp}. Passhash9 currently uses HMAC(SHA-1) for the underlying PBKDF2 psuedo-random function, but can be extended to use different algorithms in the future if necessary. For instance using a PRF based on Blowfish (a block cipher that requires 4 KiB of RAM for efficient execution) could be used to make hardware-based password cracking more expensive (this was one motivation for Blowfish's use in the bcrypt hashing scheme, in fact). \subsection{Checksums} Checksums are very similar to hash functions, and in fact share the same interface. But there are some significant differences, the major ones being that the output size is very small (usually in the range of 2 to 4 bytes), and is not cryptographically secure. But for their intended purpose (error checking), they perform very well. Some examples of checksums included in Botan are the Adler32 and CRC32 checksums. \subsection{Exceptions} Sooner or later, something is going to go wrong. Botan's behavior when something unusual occurs, like most C++ software, is to throw an exception. Exceptions in Botan are derived from the \type{Exception} class. You can see most of the major varieties of exceptions used in Botan by looking at \filename{exceptn.h}. The only function you really need to concern yourself with is \type{const char*} \function{what()}. This will return an error message relevant to the error that occurred. For example: \begin{verbatim} try { // various Botan operations } catch(Botan::Exception& e) { cout << "Botan exception caught: " << e.what() << endl; // error handling, or just abort } \end{verbatim} Botan's exceptions are derived from \type{std::exception}, so you don't need to explicitly check for Botan exceptions if you're already catching the ISO standard ones. \subsection{Threads and Mutexes} Botan includes a mutex system, which is used internally to lock some shared data structures that must be kept shared for efficiency reasons (mostly, these are in the allocation systems~--~handing out 1000 separate allocators hurts performance and makes caching memory blocks useless). This system is supported by the \texttt{mux\_pthr} module, implementing the \type{Mutex} interface for systems that have POSIX threads. If your application is using threads, you \emph{must} add the option ``thread\_safe'' to the options string when you create the \type{LibraryInitializer} object. If you specify this option and no mutex type is available, an exception is thrown, since otherwise you would probably be facing a nasty crash. \subsection{Secure Memory} A major concern with mixing modern multiuser OSes and cryptographic code is that at any time the code (including secret keys) could be swapped to disk, where it can later be read by an attacker. Botan stores almost everything (and especially anything sensitive) in memory buffers that a) clear out their contents when their destructors are called, and b) have easy plugins for various memory locking functions, such as the \function{mlock}(2) call on many Unix systems. Two of the allocation method used (``malloc'' and ``mmap'') don't require any extra privileges on Unix, but locking memory does. At startup, each allocator type will attempt to allocate a few blocks (typically totaling 128k), so if you want, you can run your application \texttt{setuid} \texttt{root}, and then drop privileges immediately after creating your \type{LibraryInitializer}. If you end up using more than what's been allocated, some of your sensitive data might end up being swappable, but that beats running as \texttt{root} all the time. These classes should also be used within your own code for storing sensitive data. They are only meant for primitive data types (int, long, etc): if you want a container of higher level Botan objects, you can just use a \verb|std::vector|, since these objects know how to clear themselves when they are destroyed. You cannot, however, have a \verb|std::vector| (or any other container) of \type{Pipe}s or \type{Filter}s, because these types have pointers to other \type{Filter}s, and implementing copy constructors for these types would be both hard and quite expensive (vectors of pointers to such objects is fine, though). These types are not described in any great detail: for more information, consult the definitive sources~--~the header files \filename{secmem.h} and \filename{allocate.h}. \type{SecureBuffer} is a simple array type, whose size is specified at compile time. It will automatically convert to a pointer of the appropriate type, and has a number of useful functions, including \function{clear()}, and \type{u32bit} \function{size()}, which returns the length of the array. It is a template that takes as parameters a type, and a constant integer which is how long the array is (for example: \verb|SecureBuffer key;|). \type{SecureVector} is a variable length array. Its size can be increased or decreased as need be, and it has a wide variety of functions useful for copying data into its buffer. Like \type{SecureBuffer}, it implements \function{clear} and \function{size}. \subsection{Allocators} The containers described above get their memory from allocators. As a user of the library, you can add new allocator methods at run time for containers, including the ones used internally by the library, to use. The interface to this is in \filename{allocate.h}. Code needing to allocate or deallocate memory calls \function{get\_allocator}, which returns a pointer to an allocator object. This pointer should not be freed: the caller does not own the allocator (it is shared among multiple allocatore users, and uses a mutex to serialize access internally if necessary). It is possible to call \function{get\_allocator} with a specific name to request a particular type of allocator, otherwise, a default allocator type is returned. At start time, the only allocator known is a \type{Default\_Allocator}, which just allocates memory using \function{malloc}, and \function{memset}s it to 0 when the memory is released. It is known by the name ``malloc''. If you ask for another type of allocator (``locking'' and ``mmap'' are currently used), and it is not available, some other allocator will be returned. You can add in a new allocator type using \function{add\_allocator\_type}. This function takes a string and a pointer to an allocator. The string gives this allocator type a name to which it can be referred when one is requesting it with \function{get\_allocator}. If an error occurs (such as the name being already registered), this function returns false. It will return true if the allocator was successfully registered. If you ask it to, \type{LibraryInitializer} will do this for you. Finally, you can set the default allocator type that will be returned using the policy setting ``default\_alloc'' to the name of any previously registered allocator. \subsection{BigInt} \type{BigInt} is Botan's implementation of a multiple-precision integer. Thanks to C++'s operator overloading features, using \type{BigInt} is often quite similar to using a native integer type. The number of functions related to \type{BigInt} is quite large. You can find most of them in \filename{bigint.h} and \filename{numthry.h}. Due to the sheer number of functions involved, only a few, which a regular user of the library might have to deal with, are mentioned here. Fully documenting the MPI library would take a significant while, so if you need to use it now, the best way to learn is to look at the headers. Probably the most important are the encoding/decoding functions, which transform the normal representation of a \type{BigInt} into some other form, such as a decimal string. \type{SecureVector} \function{BigInt::encode}(\type{BigInt}, \type{Encoding}) \noindent and \type{BigInt} \function{BigInt::decode}(\type{SecureVector}, \type{Encoding}) \type{Encoding} is an enum that has values \type{Binary}, \type{Octal}, \type{Decimal}, and \type{Hexadecimal}. The parameter will default to \type{Binary}. These functions are static member functions, so they would be called like this: \begin{verbatim} BigInt n1; // some number SecureVector n1_encoded = BigInt::encode(n1); BigInt n2 = BigInt::decode(n1_encoded); // now n1 == n2 \end{verbatim} There are also C++-style I/O operators defined for use with \type{BigInt}. The input operator understands negative numbers, hexadecimal numbers (marked with a leading ``0x''), and octal numbers (marked with a leading '0'). The '-' must come before the ``0x'' or '0' marker. The output operator will never adorn the output; for example, when printing a hexadecimal number, there will not be a leading ``0x'' (though a leading '-' will be printed if the number is negative). If you want such things, you'll have to do them yourself. \type{BigInt} has constructors that can create a \type{BigInt} from an unsigned integer or a string. You can also decode a \type{byte}[] / length pair into a BigInt. There are several other \type{BigInt} constructors, which I would seriously recommend you avoid, as they are only intended for use internally by the library, and may arbitrarily change, or be removed, in a future release. An random sampling of \type{BigInt} related functions: \type{u32bit} \function{BigInt::bytes}(): Return the size of this \type{BigInt} in bytes. \type{BigInt} \function{random\_prime(\type{u32bit} \arg{b})}: Return a prime number \arg{b} bits long. \type{BigInt} \function{gcd}(\type{BigInt} \arg{x}, \type{BigInt} \arg{y}): Returns the greatest common divisor of \arg{x} and \arg{y}. Uses the binary GCD algorithm. \type{bool} \function{is\_prime}(\type{BigInt} \arg{x}): Returns true if \arg{x} is a (possible) prime number. Uses the Miller-Rabin probabilistic primality test with fixed bases. For higher assurance, use \function{verify\_prime}, which uses more rounds and randomized 48-bit bases. \subsubsection{Efficiency Hints} If you can, always use expressions of the form \verb|a += b| over \verb|a = a + b|. The difference can be \emph{very} substantial, because the first form prevents at least one needless memory allocation, and possibly as many as three. If you're doing repeated modular exponentiations with the same modulus, create a \type{BarrettReducer} ahead of time. If the exponent or base is a constant, use the classes in \filename{mod\_exp.h}. This stuff is all handled for you by the normal high-level interfaces, of course. Never use the low-level MPI functions (those that begin with \texttt{bigint\_}). These are completely internal to the library, and may make arbitrarily strange and undocumented assumptions about their inputs, and don't check to see if they are true, on the assumption that only the library itself calls them, and that the library knows what the assumptions are. The interfaces for these functions can change completely without notice. \section{Algorithms} \subsection{Recommended Algorithms} This section is by no means the last word on selecting which algorithms to use. However, Botan includes a sometimes bewildering array of possible algorithms, and unless you're familiar with the latest developments in the field, it can be hard to know what is secure and what is not. The following attributes of the algorithms were evaluated when making this list: security, standardization, patent status, support by other implementations, and efficiency (in roughly that order). It is intended as a set of simple guidelines for developers, and nothing more. It's entirely possible that there are algorithms in Botan that will turn out to be more secure than the ones listed, but the algorithms listed here are (currently) thought to be safe. \begin{list}{$\cdot$} \item Block ciphers: AES or Serpent in CBC, CTR, or XTS mode \item Hash functions: SHA-256, SHA-512 \item MACs: HMAC with any recommended hash function \item Public Key Encryption: RSA with ``EME1(SHA-256)'' \item Public Key Signatures: RSA with EMSA4 and any recommended hash, or DSA or ECDSA with ``EMSA1(SHA-256)'' \item Key Agreement: Diffie-Hellman or ECDH, with ``KDF2(SHA-256)'' \end{list} \subsection{Algorithms Listing} Botan includes a very sizable number of cryptographic algorithms. In nearly all cases, you never need to know the header file or type name to use them. However, you do need to know what string (or strings) are used to identify that algorithm. These names conform to those set out by SCAN (Standard Cryptographic Algorithm Naming), which is a document that specifies how strings are mapped onto algorithm objects, which is useful for a wide variety of crypto APIs (SCAN is oriented towards Java, but Botan and several other non-Java libraries also make at least some use of it). For full details, read the SCAN document, which can be found at \url{http://www.users.zetnet.co.uk/hopwood/crypto/scan/} Many of these algorithms can take options (such as the number of rounds in a block cipher, the output size of a hash function, etc). These are shown in the following list; all of them default to reasonable values. There are algorithm-specific limits on most of them. When you see something like ``HASH'' or ``BLOCK'', that means you should insert the name of some algorithm of that type. There are no defaults for those options. A few very obscure algorithms are skipped; if you need one of them, you'll know it, and you can look in the appropriate header to see what that classes' \function{name} function returns (the names tend to match that in SCAN, if it's defined there). \begin{list}{$\cdot$} \item ROUNDS: The number of rounds in a block cipher. \item \item OUTSZ: The output size of a hash function or MAC \end{list} \vskip .05in \noindent \textbf{Block Ciphers:} ``AES'' (and ``AES-128'', ``AES-192'', and ``AES-256''), ``Blowfish'', ``CAST-128'', ``CAST-256'', ``DES'', ``DESX'', ``TripleDES'', ``GOST-28147-89'', ``IDEA'', ``KASUMI'', ``MARS'', ``MISTY1(ROUNDS)'', ``Noekeon'', ``RC2'', ``RC5(ROUNDS)'', ``RC6'', ``SAFER-SK(ROUNDS)'', ``SEED'', ``Serpent'', ``Skipjack'', ``Square'', ``TEA'', ``Twofish'', ``XTEA'' \noindent \textbf{Stream Ciphers:} ``ARC4'', ``MARK4'', ``Salsa20'', ``Turing'', ``WiderWake4+1-BE'' \noindent \textbf{Hash Functions:} ``HAS-160'', ``GOST-34.11'', ``MD2'', ``MD4'', ``MD5'', ``RIPEMD-128'', ``RIPEMD-160'', ``SHA-160'', ``SHA-256'', ``SHA-384'', ``SHA-512'', ``Skein-512'', ``Tiger(OUTSZ)'', ``Whirlpool'' \noindent \textbf{MACs:} ``HMAC(HASH)'', ``CMAC(BLOCK)'', ``X9.19-MAC'' \section{Support and Further Information} \subsection{Patents} Some of the algorithms implemented by Botan may be covered by patents in some locations. Algorithms known to have patent claims on them in the United States and that are not available in a license-free/royalty-free manner include: IDEA, MISTY1, RC5, RC6, and Nyberg-Rueppel. You must not assume that, just because an algorithm is not listed here, it is not encumbered by patents. If you have any concerns about the patent status of any algorithm you are considering using in an application, please discuss it with your attorney. \subsection{Support} Questions or problems you have with Botan can be directed to the development mailing list. Joining this list is highly recommended if you're going to be using Botan, since often advance notice of upcoming changes is sent there. ``Philosophical'' bug reports, announcements of programs using Botan, and anything else having to do with Botan are also welcome. The lists can be found at \url{http://lists.randombit.net/mailman/listinfo/}. \subsection{Contact Information} A PGP key with a fingerprint of \verb|621D AF64 11E1 851C 4CF9 A2E1 6211 EBF1 EFBA DFBC| is used to sign all Botan releases. This key can be found in the file \filename{doc/pgpkeys.asc}; PGP keys for the developers are also stored there. \vskip 5pt \noindent Web Site: \url{http://botan.randombit.net} \subsection{License} Copyright \copyright 2000-2010, Jack Lloyd Licensed under the same terms as the Botan source \end{document}