diff options
-rw-r--r-- | doc/internals.tex | 163 | ||||
-rw-r--r-- | doc/misc/internals.tex | 153 |
2 files changed, 163 insertions, 153 deletions
diff --git a/doc/internals.tex b/doc/internals.tex new file mode 100644 index 000000000..c34b9ec8e --- /dev/null +++ b/doc/internals.tex @@ -0,0 +1,163 @@ +\documentclass{article} + +\setlength{\textwidth}{6.75in} % 1 inch side margins +\setlength{\textheight}{9in} % ~1 inch top and bottom margins + +\setlength{\headheight}{0in} +\setlength{\topmargin}{0in} +\setlength{\headsep}{0in} + +\setlength{\oddsidemargin}{0in} +\setlength{\evensidemargin}{0in} + +\title{Botan Internals} +\author{Jack Lloyd ([email protected])} +\date{August 20, 2006} + +\newcommand{\filename}[1]{\texttt{#1}} +\newcommand{\manpage}[2]{\texttt{#1}(#2)} + +\newcommand{\function}[1]{\textbf{#1}} +\newcommand{\type}[1]{\texttt{#1}} +\renewcommand{\arg}[1]{\textsl{#1}} + +\begin{document} + +\maketitle + +\tableofcontents + +\parskip=5pt + +\section{Introduction} + +This document is intended to document some of the trickier and/or more +complicated parts of Botan. This is not going to be terribly useful if +you just want to use the library, but for people wishing to understand +how it works, or contribute new code to it, it will hopefully prove +helpful. + +I've realized that a lot of things Botan does internally are pretty +hard to understand, and that a lot of things are only inside my head, +which is a bad place for them to be (things tend to get lost in there, +not to mention the possibility that I'll get hit by a truck next +week). + +This document is currently very incomplete. I'll be working on it as I +have time. + +\pagebreak + +\section{Filter} + +Filter is one of the core abstractions of the library. It represents + +\section{Pipe} + +Pipe is, conceptually, a tree structure of Filter objects. There is a +single unique top, and an arbitrary number of leaves (which are +SecureQueue objects). SecureQueue is a simple Filter that buffers its +input. + +Writing into the pipe writes into the top of the tree. The filter at +the top of the tree writes its output into the next Filter, and so on +until eventually data trickles down into the bottommost Filters, where +the data is stored for later retrieval. + +When a new message is started, Pipe searches through the tree of +Filters and finds places where the \arg{next} field of the Filter is +NULL. This implies that it was the lowest layer of the Filter tree +that the user added. It then adds SecureQueue objects onto these +Filters. These queues are also stored in a \type{std::vector} called +\arg{messages}. This is how the Pipe knows how to read from them later +without doing a tree traversal every time. + +Pipe will, if asked, destroy the existing tree structure, in order to +create a new one. However, the queue objects are not deleted, because +Pipe might need to read from them later. + +An optimization in future versions will involve deleting empty queues +that we ``know'' can't be written to, and then replace their field in +\arg{messages} with NULL. On reading, Pipe will know that this means +that the queue is empty, and act as if such a queue was really +there. This is relatively minor, because in recent versions an empty +queue only takes up a few dozen bytes (previous to 0.8.4 or so, an +empty queue still took up 4 kilobytes of memory). + +\section{Library Initialization} + +A lot of messy corner cases. + +\section{Lookup Mechanism} + +Most objects know their name, and they know how to create a new copy +of themselves. We build mapping tables that map from an algorithm name +into a single instance of that algorithm. The tables themselves can be +found in \filename{src/lookup.cpp}. + +There are a set of functions named \function{add\_algorithm} that can +be used to populate the tables. We get something out of the table with +\function{retrieve\_x}, where x is the name of a type +(\texttt{block\_cipher}, \texttt{hash}, etc). This returns a const +pointer to the single unique instance of the algorithm that the lookup +tables know about. If it doesn't know about it, it falls back on +calling a function called \function{try\_to\_get\_x}. These functions +live in \filename{src/algolist.cpp}. They are mostly used to handle +algorithms which need (or at least can have) arguments passed to them, +like \type{HMAC} and \type{SAFER\_SK}. It will return NULL if it can't +find the algorithm at all. + +When it's asked for an algorithm it doesn't know about (ie, isn't in +the mapping tables), the retrieval functions will ask the try-to-get +functions if \emph{they} know about it. If they do, then the object +returned will be stored into the table for later retrieval. + +The functions \function{get\_x} call the retrieval functions. If we +get back NULL, an exception is thrown. Otherwise it will call the +\function{clone} method to get a new copy of the algorithm, which it +returns. + +The various functions like \function{output\_length\_of} call the +retrieval function for each type of object that the parameter in +question (in this case, \texttt{OUTPUT\_LENGTH}) might be meaningful +for. If it manages to get back an object, it will return (in this +case) the \texttt{OUTPUT\_LENGTH} field of the object. No allocations +are required to call this function: all of its operations work +directly on the copies living in the lookup tables. + +\section{Allocators} + +A big (slow) mess. + +\section{BigInt} + +Read ``Handbook of Applied Cryptography''. + +\section{PEM/BER Identification} + +We have a specific algorithm for figuring out if something is PEM or +BER. Previous versions (everything before 1.3.0) requried that the +caller specify which one it was, and they had to be right. Now we use +a hueristic (aka, an algorithm that sometimes doesn't work right) to +figure it out. If the first character is not 0x30 (equal to ASCII +'0'), then it can't possibly be BER (because everything we care about +is enclosed in an ASN.1 SEQUENCE, which for BER/DER is encoded as +beginning with 0x30). Roughly 99.9% of PEM blocks \emph{won't} have a +random 0 character in front of them, so we are mostly safe (unless +someone does it on purpose, in which case, please hit them for me). +But to be sure, if there is a 0, then we search the first \emph{N} +bytes of the block for the string ``-----BEGIN ``, which marks the +typical start of a PEM block. The specific \emph{N} depends on the +variable ``base/pem\_search'', which defaults to 4 kilobytes. + +So, you can actually fool it either way: that a PEM file is really +BER, or that a BER file is actually PEM. To fool it that a BER file is +PEM, just have the string ``-----BEGIN `` somewhere (I can't imagine +this string shows up in certificates or CRLs too often, so if it is +there it means somebody is being a jerk). If a file starts with 0 and +has at least ``base/pem\_search'' byte more junk in the way, it won't +notice that its PEM at all. In either case, of course, the loading +will fail, and you'll get a nice exception saying that the decoding +failed. + +\end{document} diff --git a/doc/misc/internals.tex b/doc/misc/internals.tex deleted file mode 100644 index 81eca146a..000000000 --- a/doc/misc/internals.tex +++ /dev/null @@ -1,153 +0,0 @@ -\documentclass{article} - -\setlength{\textwidth}{6.75in} % 1 inch side margins -\setlength{\textheight}{9in} % ~1 inch top and bottom margins - -\setlength{\headheight}{0in} -\setlength{\topmargin}{0in} -\setlength{\headsep}{0in} - -\setlength{\oddsidemargin}{0in} -\setlength{\evensidemargin}{0in} - -\title{Botan Internals} -\author{Jack Lloyd ([email protected])} -\date{July 30, 2002} - -\newcommand{\filename}[1]{\texttt{#1}} -\newcommand{\manpage}[2]{\texttt{#1}(#2)} - -\newcommand{\function}[1]{\textbf{#1}} -\newcommand{\type}[1]{\texttt{#1}} -\renewcommand{\arg}[1]{\textsl{#1}} - -\begin{document} - -\maketitle - -\tableofcontents - -\parskip=5pt - -\section{Introduction} - -This document is intended to document some of the trickier and/or more -complicated parts of Botan. This is not going to be terribly useful if you -just want to use the library, but for people wishing to understand how it -works, or contribute new code to it, it will hopefully prove helpful. - -I've realized that a lot of things Botan does internally are pretty hard to -understand, and that a lot of things are only inside my head, which is a bad -place for them to be (things tend to get lost in there, not to mention the -possibility that I'll get hit by a truck next week). - -This document is currently very incomplete. I'll be working on it as I have -time. - -\pagebreak - -\section{Filter} - -Need something here. - -\section{Pipe} - -Pipe is, conceptually, a tree structure of Filter objects. There is a single -unique top, and an arbitrary number of leaves (which are SecureQueue objects). -SecureQueue is a simple Filter that buffers it's input. - -Writing into the pipe writes into the top of the tree. The filter at the top -of the tree writes it's output into the next Filter, and so on until eventually -data trickles down into the bottommost Filters, where the data is stored for -later retrieval. - -When a new message is started, Pipe searches through the tree of Filters and -finds places where the \arg{next} field of the Filter is NULL. This implies -that it was the lowest layer of the Filter tree that the user added. It then -adds SecureQueue objects onto these Filters. These queues are also stored in a -\type{std::vector} called \arg{messages}. This is how the Pipe knows how to -read from them later without doing a tree traversal every time. - -Pipe will, if asked, destroy the existing tree structure, in order to create a -new one. However, the queue objects are not deleted, because Pipe might need to -read from them later. - -An optimization in future versions will involve deleting empty queues that we -``know'' can't be written to, and then replace their field in \arg{messages} -with NULL. On reading, Pipe will know that this means that the queue is empty, -and act as if such a queue was really there. This is relatively minor, because -in recent versions an empty queue only takes up a few dozen bytes (previous to -0.8.4 or so, an empty queue still took up 4 kilobytes of memory). - -\section{Library Initialization} - -A lot of messy corner cases. - -\section{Lookup Mechanism} - -Most objects know their name, and they know how to create a new copy of -themselves. We build mapping tables that map from an algorithm name into a -single instance of that algorithm. The tables themselves can be found in -\filename{src/lookup.cpp}. - -There are a set of functions named \function{add\_algorithm} that can be used -to populate the tables. We get something out of the table with -\function{retrieve\_x}, where x is the name of a type (\texttt{block\_cipher}, -\texttt{hash}, etc). This returns a const pointer to the single unique instance -of the algorithm that the lookup tables know about. If it doesn't know about -it, it falls back on calling a function called -\function{try\_to\_get\_x}. These functions live in -\filename{src/algolist.cpp}. They are mostly used to handle algorithms which -need (or at least can have) arguments passed to them, like \type{HMAC} and -\type{SAFER\_SK}. It will return NULL if it can't find the algorithm at all. - -When it's asked for an algorithm it doesn't know about (ie, isn't in the -mapping tables), the retrieval functions will ask the try-to-get functions if -\emph{they} know about it. If they do, then the object returned will be stored -into the table for later retrieval. - -The functions \function{get\_x} call the retrieval functions. If we get back -NULL, an exception is thrown. Otherwise it will call the \function{clone} -method to get a new copy of the algorithm, which it returns. - -The various functions like \function{output\_length\_of} call the retrieval -function for each type of object that the parameter in question (in this case, -\texttt{OUTPUT\_LENGTH}) might be meaningful for. If it manages to get back an -object, it will return (in this case) the \texttt{OUTPUT\_LENGTH} field of the -object. No allocations are required to call this function: all of it's -operations work directly on the copies living in the lookup tables. - -\section{Allocators} - -A big (slow) mess. - -\section{BigInt} - -Read ``Handbook of Applied Cryptography''. - -\section{PEM/BER Identification} - -We have a specific algorithm for figuring out if something is PEM or -BER. Previous versions (everything before 1.3.0) requried that the caller -specify which one it was, and they had to be right. Now we use a hueristic -(aka, an algorithm that sometimes doesn't work right) to figure it out. If the -first character is not 0x30 (equal to ASCII '0'), then it can't possibly be BER -(because everything we care about is enclosed in an ASN.1 SEQUENCE, which for -BER/DER is encoded as beginning with 0x30). Roughly 99.9% of PEM blocks -\emph{won't} have a random 0 character in front of them, so we are mostly safe -(unless someone does it on purpose, in which case, please hit them for me). -But to be sure, if there is a 0, then we search the first \emph{N} bytes of the -block for the string ``-----BEGIN ``, which marks the typical start of a PEM -block. The specific \emph{N} depends on the variable ``base/pem\_search'', -which defaults to 4 kilobytes. - -So, you can actually fool it either way: that a PEM file is really BER, or that -a BER file is actually PEM. To fool it that a BER file is PEM, just have the -string ``-----BEGIN `` somewhere (I can't imagine this string shows up in -certificates or CRLs too often, so if it is there it means somebody is being a -jerk). If a file starts with 0 and has at least ``base/pem\_search'' byte more -junk in the way, it won't notice that it's PEM at all. In either case, of -course, the loading will fail, and you'll get a nice exception saying that the -decoding failed. - -\end{document} |