From 9426f6d0f4a760c555379c3af642127df7e1456e Mon Sep 17 00:00:00 2001 From: lloyd Date: Sun, 10 May 2015 02:39:38 +0000 Subject: Update compression docs --- doc/manual/compression.rst | 52 ++++++++++++++++++++++++++++++++++++++++++++++ doc/manual/filters.rst | 42 ------------------------------------- 2 files changed, 52 insertions(+), 42 deletions(-) create mode 100644 doc/manual/compression.rst (limited to 'doc') diff --git a/doc/manual/compression.rst b/doc/manual/compression.rst new file mode 100644 index 000000000..c58ba58a6 --- /dev/null +++ b/doc/manual/compression.rst @@ -0,0 +1,52 @@ +Lossless Data Compression +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Some lossless data compression algorithms are available in botan, currently all +via third party libraries - these include zlib (including deflate and gzip +formats), bzip2, and lzma. + +.. note:: + You should always compress *before* you encrypt, because encryption seeks to + hide the redundancy that compression is supposed to try to find and remove. + +All compressors provide the `Transform` interface through a subclass +`Compression_Transform` (defined in compression.h). The compression algorithms +have some limitations in terms of the standard API, in particular the +`output_length` function simply throws an exception since the value cannot be +determined merely from the input length for such an algorithm. + +The transformations work much like any other - calling `update` on a vector +returns the (de)compressed result, calling `finish` completes the computation. +All (de)compression algorithms will accept inputs of any size +(update_granularity is 1) and do not require any final data be saved to be +passed to `finish`. + +On `Compression_Transform` an additional function function `flush` is available +which (in addition to always acting as equivalent to an `update`) signals the +compression function to flush as much output as possible immediately, regardless +of considerations of compression ratio. Any compressor or decompressor may +ignore this and treat it as equivalent to a normal update. + +The easiest way to get a compressor is via the functions + +.. cpp:function:: Compression_Transform* make_compressor(std::string type, size_t level) +.. cpp:function:: Compression_Transform* make_decompressor(std::string type) + +Supported values for `type` include `zlib` (raw zlib with no checksum), +`deflate` (zlib's deflate format), `gzip`, `bz2`, and `lzma`. A null pointer +will be returned if the algorithm is unavailable. The meaning of the `level` +parameter varies by the algorithm but generally takes a value between 1 and 9, +with higher values implying typically better compression from and more memory +and/or CPU time consumed by the compression process. The decompressor can always +handle input from any compressor. + +As with any consumer of complex formats, a decompressor may throw an exception +(from either `update` or `finish`) if the input is invalid or corrupt. + +To use a compression algorithm in a `Pipe` use the adaptor types +`Compression_Filter` and `Decompression_Filter` from `comp_filter.h`. The +constructors of both filters take a `std::string` argument (passed to +`make_compressor` or `make_decompressor`), the compression filter also takes a +`level` parameter. Finally both constructors have a parameter `buf_sz` which +specifies the size of the internal buffer that will be used - inputs will be +broken into blocks of this size. The default is 4096. diff --git a/doc/manual/filters.rst b/doc/manual/filters.rst index e8016eac7..bd73739af 100644 --- a/doc/manual/filters.rst +++ b/doc/manual/filters.rst @@ -693,48 +693,6 @@ letters for its output. You can find the declarations for these types in ``hex_filt.h`` and ``b64_filt.h``. -Compressors -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -There are two compression algorithms supported by Botan, zlib and -bzip2. Only lossless compression algorithms are currently supported by -Botan, because they tend to be the most useful for -cryptography. However, it is very reasonable to consider supporting -something like GSM speech encoding (which is lossy), for use in -encrypted voice applications. - -You should always compress *before* you encrypt, because encryption seeks -to hide the redundancy that compression is supposed to try to find and remove. - -To test for Bzip2, check to see if ``BOTAN_HAS_COMPRESSOR_BZIP2`` is -defined. If so, you can include ``botan/bzip2.h``, which will declare -a pair of ``Filter`` objects: ``Bzip2_Compression`` and -``Bzip2_Decompression``. - -You should be prepared to take an exception when using the -decompressing filter, for if the input is not valid bzip2 data, that -is what you will receive. You can specify the desired level of -compression to ``Bzip2_Compression``'s constructor as an integer -between 1 and 9, 1 meaning worst compression, and 9 meaning the -best. The default is to use 9, since small values take the same amount -of time, just use a little less memory. - -Zlib compression works much like Bzip2 compression. The only -differences in this case are that the macro is -``BOTAN_HAS_COMPRESSOR_ZLIB``, the header you need to include is -called ``botan/zlib.h`` (remember that you shouldn't just ``#include -``, or you'll get the regular zlib API, which is not what you -want). The Botan classes for zlib compression/decompression are called -``Zlib_Compression`` and ``Zlib_Decompression``. - -Like Bzip2, a ``Zlib_Decompression`` object will throw an exception if -invalid (in the sense of not being in the Zlib format) data is passed -into it. - -While the zlib compression library uses the same compression algorithm -as the gzip and zip programs, the format is different. The zlib format -is defined in RFC 1950. - Writing New Filters --------------------------------- -- cgit v1.2.3