doc/manual/compression.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90

Lossless Data Compression
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Some lossless data compression algorithms are available in botan, currently all
via third party libraries - these include zlib (including deflate and gzip
formats), bzip2, and lzma. Support for these must be enabled at build time;
you can check for them using the macros ``BOTAN_HAS_ZLIB``, ``BOTAN_HAS_BZIP2``,
and ``BOTAN_HAS_LZMA``.

.. note::
   You should always compress *before* you encrypt, because encryption seeks to
   hide the redundancy that compression is supposed to try to find and remove.

Compression is done through the ``Compression_Algorithm`` and
``Decompression_Algorithm`` classes, both defined in `compression.h`

Compression and decompression both work in three stages: starting a
message (``start``), continuing to process it (``update``), and then
finally completing processing the stream (``finish``).

.. cpp:class:: Compression_Algorithm

  .. cpp:function:: void start(size_t level)

       Initialize the compression engine. This must be done before calling
       ``update`` or ``finish``. The meaning of the `level` parameter varies by
       the algorithm but generally takes a value between 1 and 9, with higher
       values implying typically better compression from and more memory and/or
       CPU time consumed by the compression process. The decompressor can always
       handle input from any compressor.

  .. cpp:function::  void update(secure_vector<uint8_t>& buf, \
                                 size_t offset = 0, bool flush = false)

       Compress the material in the in/out parameter ``buf``. The leading
       ``offset`` bytes of ``buf`` are ignored and remain untouched; this can be
       useful for ignoring packet headers.  If ``flush`` is true, the
       compression state is flushed, allowing the decompressor to recover the
       entire message up to this point without having the see the rest of the
       compressed stream.

   .. cpp::function:: void finish(secure_vector<uint8_t>& buf, size_t offset = 0)

       Finish compressing a message. The ``buf`` and ``offset`` parameters are
       treated as in ``update``. It is acceptable to call ``start`` followed by
       ``finish`` with the entire message, without any intervening call to
       ``update``.

.. cpp:class:: Decompression_Algorithm

  .. cpp:function:: void start()

       Initialize the decompression engine. This must be done before calling
       ``update`` or ``finish``. No level is provided here; the decompressor
       can accept input generated by any compression parameters.

  .. cpp:function::  void update(secure_vector<uint8_t>& buf, \
                                 size_t offset = 0)

       Decompress the material in the in/out parameter ``buf``. The leading
       ``offset`` bytes of ``buf`` are ignored and remain untouched; this can be
       useful for ignoring packet headers.

       This function may throw if the data seems to be invalid.

   .. cpp::function:: void finish(secure_vector<uint8_t>& buf, size_t offset = 0)

       Finish decompressing a message. The ``buf`` and ``offset`` parameters are
       treated as in ``update``. It is acceptable to call ``start`` followed by
       ``finish`` with the entire message, without any intervening call to
       ``update``.

       This function may throw if the data seems to be invalid.

The easiest way to get a compressor is via the functions

.. cpp:function:: Compression_Algorithm* make_compressor(std::string type)
.. cpp:function:: Decompression_Algorithm* make_decompressor(std::string type)

Supported values for `type` include `zlib` (raw zlib with no checksum),
`deflate` (zlib's deflate format), `gzip`, `bz2`, and `lzma`. A null pointer
will be returned if the algorithm is unavailable.

To use a compression algorithm in a `Pipe` use the adaptor types
`Compression_Filter` and `Decompression_Filter` from `comp_filter.h`. The
constructors of both filters take a `std::string` argument (passed to
`make_compressor` or `make_decompressor`), the compression filter also takes a
`level` parameter. Finally both constructors have a parameter `buf_sz` which
specifies the size of the internal buffer that will be used - inputs will be
broken into blocks of this size. The default is 4096.