aboutsummaryrefslogtreecommitdiffstats
path: root/doc/internals.tex
diff options
context:
space:
mode:
Diffstat (limited to 'doc/internals.tex')
-rw-r--r--doc/internals.tex163
1 files changed, 163 insertions, 0 deletions
diff --git a/doc/internals.tex b/doc/internals.tex
new file mode 100644
index 000000000..c34b9ec8e
--- /dev/null
+++ b/doc/internals.tex
@@ -0,0 +1,163 @@
+\documentclass{article}
+
+\setlength{\textwidth}{6.75in} % 1 inch side margins
+\setlength{\textheight}{9in} % ~1 inch top and bottom margins
+
+\setlength{\headheight}{0in}
+\setlength{\topmargin}{0in}
+\setlength{\headsep}{0in}
+
+\setlength{\oddsidemargin}{0in}
+\setlength{\evensidemargin}{0in}
+
+\title{Botan Internals}
+\author{Jack Lloyd ([email protected])}
+\date{August 20, 2006}
+
+\newcommand{\filename}[1]{\texttt{#1}}
+\newcommand{\manpage}[2]{\texttt{#1}(#2)}
+
+\newcommand{\function}[1]{\textbf{#1}}
+\newcommand{\type}[1]{\texttt{#1}}
+\renewcommand{\arg}[1]{\textsl{#1}}
+
+\begin{document}
+
+\maketitle
+
+\tableofcontents
+
+\parskip=5pt
+
+\section{Introduction}
+
+This document is intended to document some of the trickier and/or more
+complicated parts of Botan. This is not going to be terribly useful if
+you just want to use the library, but for people wishing to understand
+how it works, or contribute new code to it, it will hopefully prove
+helpful.
+
+I've realized that a lot of things Botan does internally are pretty
+hard to understand, and that a lot of things are only inside my head,
+which is a bad place for them to be (things tend to get lost in there,
+not to mention the possibility that I'll get hit by a truck next
+week).
+
+This document is currently very incomplete. I'll be working on it as I
+have time.
+
+\pagebreak
+
+\section{Filter}
+
+Filter is one of the core abstractions of the library. It represents
+
+\section{Pipe}
+
+Pipe is, conceptually, a tree structure of Filter objects. There is a
+single unique top, and an arbitrary number of leaves (which are
+SecureQueue objects). SecureQueue is a simple Filter that buffers its
+input.
+
+Writing into the pipe writes into the top of the tree. The filter at
+the top of the tree writes its output into the next Filter, and so on
+until eventually data trickles down into the bottommost Filters, where
+the data is stored for later retrieval.
+
+When a new message is started, Pipe searches through the tree of
+Filters and finds places where the \arg{next} field of the Filter is
+NULL. This implies that it was the lowest layer of the Filter tree
+that the user added. It then adds SecureQueue objects onto these
+Filters. These queues are also stored in a \type{std::vector} called
+\arg{messages}. This is how the Pipe knows how to read from them later
+without doing a tree traversal every time.
+
+Pipe will, if asked, destroy the existing tree structure, in order to
+create a new one. However, the queue objects are not deleted, because
+Pipe might need to read from them later.
+
+An optimization in future versions will involve deleting empty queues
+that we ``know'' can't be written to, and then replace their field in
+\arg{messages} with NULL. On reading, Pipe will know that this means
+that the queue is empty, and act as if such a queue was really
+there. This is relatively minor, because in recent versions an empty
+queue only takes up a few dozen bytes (previous to 0.8.4 or so, an
+empty queue still took up 4 kilobytes of memory).
+
+\section{Library Initialization}
+
+A lot of messy corner cases.
+
+\section{Lookup Mechanism}
+
+Most objects know their name, and they know how to create a new copy
+of themselves. We build mapping tables that map from an algorithm name
+into a single instance of that algorithm. The tables themselves can be
+found in \filename{src/lookup.cpp}.
+
+There are a set of functions named \function{add\_algorithm} that can
+be used to populate the tables. We get something out of the table with
+\function{retrieve\_x}, where x is the name of a type
+(\texttt{block\_cipher}, \texttt{hash}, etc). This returns a const
+pointer to the single unique instance of the algorithm that the lookup
+tables know about. If it doesn't know about it, it falls back on
+calling a function called \function{try\_to\_get\_x}. These functions
+live in \filename{src/algolist.cpp}. They are mostly used to handle
+algorithms which need (or at least can have) arguments passed to them,
+like \type{HMAC} and \type{SAFER\_SK}. It will return NULL if it can't
+find the algorithm at all.
+
+When it's asked for an algorithm it doesn't know about (ie, isn't in
+the mapping tables), the retrieval functions will ask the try-to-get
+functions if \emph{they} know about it. If they do, then the object
+returned will be stored into the table for later retrieval.
+
+The functions \function{get\_x} call the retrieval functions. If we
+get back NULL, an exception is thrown. Otherwise it will call the
+\function{clone} method to get a new copy of the algorithm, which it
+returns.
+
+The various functions like \function{output\_length\_of} call the
+retrieval function for each type of object that the parameter in
+question (in this case, \texttt{OUTPUT\_LENGTH}) might be meaningful
+for. If it manages to get back an object, it will return (in this
+case) the \texttt{OUTPUT\_LENGTH} field of the object. No allocations
+are required to call this function: all of its operations work
+directly on the copies living in the lookup tables.
+
+\section{Allocators}
+
+A big (slow) mess.
+
+\section{BigInt}
+
+Read ``Handbook of Applied Cryptography''.
+
+\section{PEM/BER Identification}
+
+We have a specific algorithm for figuring out if something is PEM or
+BER. Previous versions (everything before 1.3.0) requried that the
+caller specify which one it was, and they had to be right. Now we use
+a hueristic (aka, an algorithm that sometimes doesn't work right) to
+figure it out. If the first character is not 0x30 (equal to ASCII
+'0'), then it can't possibly be BER (because everything we care about
+is enclosed in an ASN.1 SEQUENCE, which for BER/DER is encoded as
+beginning with 0x30). Roughly 99.9% of PEM blocks \emph{won't} have a
+random 0 character in front of them, so we are mostly safe (unless
+someone does it on purpose, in which case, please hit them for me).
+But to be sure, if there is a 0, then we search the first \emph{N}
+bytes of the block for the string ``-----BEGIN ``, which marks the
+typical start of a PEM block. The specific \emph{N} depends on the
+variable ``base/pem\_search'', which defaults to 4 kilobytes.
+
+So, you can actually fool it either way: that a PEM file is really
+BER, or that a BER file is actually PEM. To fool it that a BER file is
+PEM, just have the string ``-----BEGIN `` somewhere (I can't imagine
+this string shows up in certificates or CRLs too often, so if it is
+there it means somebody is being a jerk). If a file starts with 0 and
+has at least ``base/pem\_search'' byte more junk in the way, it won't
+notice that its PEM at all. In either case, of course, the loading
+will fail, and you'll get a nice exception saying that the decoding
+failed.
+
+\end{document}