doc/misc/internals.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153

\documentclass{article}

\setlength{\textwidth}{6.75in} % 1 inch side margins
\setlength{\textheight}{9in} % ~1 inch top and bottom margins

\setlength{\headheight}{0in}
\setlength{\topmargin}{0in}
\setlength{\headsep}{0in}

\setlength{\oddsidemargin}{0in}
\setlength{\evensidemargin}{0in}

\title{Botan Internals}
\author{Jack Lloyd (lloyd@randombit.net)}
\date{July 30, 2002}

\newcommand{\filename}[1]{\texttt{#1}}
\newcommand{\manpage}[2]{\texttt{#1}(#2)}

\newcommand{\function}[1]{\textbf{#1}}
\newcommand{\type}[1]{\texttt{#1}}
\renewcommand{\arg}[1]{\textsl{#1}}

\begin{document}

\maketitle

\tableofcontents

\parskip=5pt

\section{Introduction}

This document is intended to document some of the trickier and/or more
complicated parts of Botan. This is not going to be terribly useful if you
just want to use the library, but for people wishing to understand how it
works, or contribute new code to it, it will hopefully prove helpful.

I've realized that a lot of things Botan does internally are pretty hard to
understand, and that a lot of things are only inside my head, which is a bad
place for them to be (things tend to get lost in there, not to mention the
possibility that I'll get hit by a truck next week).

This document is currently very incomplete. I'll be working on it as I have
time.

\pagebreak

\section{Filter}

Need something here.

\section{Pipe}

Pipe is, conceptually, a tree structure of Filter objects. There is a single
unique top, and an arbitrary number of leaves (which are SecureQueue objects).
SecureQueue is a simple Filter that buffers it's input.

Writing into the pipe writes into the top of the tree. The filter at the top
of the tree writes it's output into the next Filter, and so on until eventually
data trickles down into the bottommost Filters, where the data is stored for
later retrieval.

When a new message is started, Pipe searches through the tree of Filters and
finds places where the \arg{next} field of the Filter is NULL. This implies
that it was the lowest layer of the Filter tree that the user added. It then
adds SecureQueue objects onto these Filters. These queues are also stored in a
\type{std::vector} called \arg{messages}. This is how the Pipe knows how to
read from them later without doing a tree traversal every time.

Pipe will, if asked, destroy the existing tree structure, in order to create a
new one. However, the queue objects are not deleted, because Pipe might need to
read from them later.

An optimization in future versions will involve deleting empty queues that we
``know'' can't be written to, and then replace their field in \arg{messages}
with NULL. On reading, Pipe will know that this means that the queue is empty,
and act as if such a queue was really there. This is relatively minor, because
in recent versions an empty queue only takes up a few dozen bytes (previous to
0.8.4 or so, an empty queue still took up 4 kilobytes of memory).

\section{Library Initialization}

A lot of messy corner cases.

\section{Lookup Mechanism}

Most objects know their name, and they know how to create a new copy of
themselves. We build mapping tables that map from an algorithm name into a
single instance of that algorithm. The tables themselves can be found in
\filename{src/lookup.cpp}.

There are a set of functions named \function{add\_algorithm} that can be used
to populate the tables. We get something out of the table with
\function{retrieve\_x}, where x is the name of a type (\texttt{block\_cipher},
\texttt{hash}, etc). This returns a const pointer to the single unique instance
of the algorithm that the lookup tables know about. If it doesn't know about
it, it falls back on calling a function called
\function{try\_to\_get\_x}. These functions live in
\filename{src/algolist.cpp}. They are mostly used to handle algorithms which
need (or at least can have) arguments passed to them, like \type{HMAC} and
\type{SAFER\_SK}. It will return NULL if it can't find the algorithm at all.

When it's asked for an algorithm it doesn't know about (ie, isn't in the
mapping tables), the retrieval functions will ask the try-to-get functions if
\emph{they} know about it. If they do, then the object returned will be stored
into the table for later retrieval.

The functions \function{get\_x} call the retrieval functions. If we get back
NULL, an exception is thrown. Otherwise it will call the \function{clone}
method to get a new copy of the algorithm, which it returns.

The various functions like \function{output\_length\_of} call the retrieval
function for each type of object that the parameter in question (in this case,
\texttt{OUTPUT\_LENGTH}) might be meaningful for. If it manages to get back an
object, it will return (in this case) the \texttt{OUTPUT\_LENGTH} field of the
object. No allocations are required to call this function: all of it's
operations work directly on the copies living in the lookup tables.

\section{Allocators}

A big (slow) mess.

\section{BigInt}

Read ``Handbook of Applied Cryptography''.

\section{PEM/BER Identification}

We have a specific algorithm for figuring out if something is PEM or
BER. Previous versions (everything before 1.3.0) requried that the caller
specify which one it was, and they had to be right. Now we use a hueristic
(aka, an algorithm that sometimes doesn't work right) to figure it out. If the
first character is not 0x30 (equal to ASCII '0'), then it can't possibly be BER
(because everything we care about is enclosed in an ASN.1 SEQUENCE, which for
BER/DER is encoded as beginning with 0x30). Roughly 99.9% of PEM blocks
\emph{won't} have a random 0 character in front of them, so we are mostly safe
(unless someone does it on purpose, in which case, please hit them for me).
But to be sure, if there is a 0, then we search the first \emph{N} bytes of the
block for the string ``-----BEGIN ``, which marks the typical start of a PEM
block. The specific \emph{N} depends on the variable ``base/pem\_search'',
which defaults to 4 kilobytes.

So, you can actually fool it either way: that a PEM file is really BER, or that
a BER file is actually PEM. To fool it that a BER file is PEM, just have the
string ``-----BEGIN `` somewhere (I can't imagine this string shows up in
certificates or CRLs too often, so if it is there it means somebody is being a
jerk). If a file starts with 0 and has at least ``base/pem\_search'' byte more
junk in the way, it won't notice that it's PEM at all. In either case, of
course, the loading will fail, and you'll get a nice exception saying that the
decoding failed.

\end{document}