Monday, May 14, 2012

Memory buffer management

 One of the objectives for the LZ4 framing format is to instruct the LZ4 decoder how to best handle memory to decode the compressed data.

To illustrate with a simple example, let's study the current framing format of lz4demo.

This framing has no option, and is therefore completely static. At the core of the specification is a "data split" methodology, which automatically split input data (typically files) into multiple blocks of 8MB each. That is to say, if the source data is larger than 8MB, then the first block will only hold the first 8MB of data, and the next block too, up to the point that only one block remains. This last one can have any size between 0 and 8 MB. The last block can also be the first one, if source data is < 8MB.

Moreover, each block is completely independant. Which means each block can be compressed/decompressed in any order, without dependency, except to output data blocks in the same order as input ones. This property is very interesting for multithreading.

Now, this simple scheme comes also with some problems of its own. Let's look into it.

First, since we don't know the size of the source data, the decoder has to allocate 8MB of memory, no matter what. Maybe the file is only 4KB for example ? Well, the decoder don't know it.
To be fair, knowing the maximum compression ratio (255:1), it could guess that, if the compressed size is 2KB, then the decoded size cannot be more than 512KB. But this is awkward and awful trick.

For modern computers, it seems this is a minor issue. However, in many other circumstances, such as embedded and low-power devices, this amount of memory is way too large.

To limit the size of allocated memory, a potential work-around for the decoder is to use a "sliding buffer" methodology.
For example, data would be decoded into blocks of size 256KB, a lot smaller than the full 8MB size. On reaching the end of the small block, decoded data would be written to disk, keeping only the last 64KB ones (copy window size) for reference of the next small block.
This is workable, and would save a lot of memory. But there are a few drawbacks :
  • A specific decoding function is necessary to do this, which does not exist (yet).
    Such function needs to keep track of its "internal state", in order to resume decoding where it left.
    Which means, for example, that we may be in the middle of match copy operation.
    The "exit" and "error" conditions are also different and more numerous.
    As a consequence, the code for such function will be more complex and slower.
  • Adding to the performance issue, the "sliding" mechanism involves repeateadly copying the last 64KB of each small block in preparation for the next one.
  • Although 256KB is a lot smaller than 8MB, it's still too large for some low-memory applications.
    Alas there is a limit to this methodology, which is the size of the "window copy operation" (64KB) plus a reasonable amount of data to decode, typically another 64KB.
    So, the minimum decoding buffer for this methodology is 128KB.
    This is better, but still too large.

As a consequence, it seems clear that there is no such thing as "one-size fits all". The size of blocks needs to be customizable, in order to fit a broader range of situations.

Additionnally, it seems better to help the decoder allocating only what's needed, especially, when it can be smaller than the default block size. Which means that, for small data source (smaller than the block size), it would be interesting to provide the original source size : this will allow the decoder to only allocate what's necessary, instead of the default maximum block size.

1 comment:

  1. Quite interesting. Smart buffering to put it simple.

    The algorithms (sp?) would choose the size of the buffer depending on the size of data. Clever compared to fixed buffer "presets" used on other compression algorithms.

    Not a programmer, but enjoy reading you dwell into these 'programming issues"