Thursday, June 26, 2014

Debunking the LZ4 "20 years old bug" myth

[Edit] : There is a follow-up to this story.

[Edit 2] The below post was redacted right after the initial publication. Should you wish a much shorter, and arguably more neutral tone, summary of the situation, please consider reading the follow up.

 A recent post on a security blog has claimed that LZ4 is affected by a subtle bug which could result in remote code execution on basically any machine using LZ4 algorithm. Given that LZ4 is installed on every modern Linux distro, including critically Android, and is also part of modern file systems such as ZFS, shipped with FreeBSD and Illumos, used too within widely deployed databases such as MySQL, or big data nodes powered by Hadoop, it must be a pretty big deal.

The article then makes a fairly good job at describing the conditions required to trigger that risk, but unfortunately, these explanations are scattered somewhere between the middle and the end of an overly long technical article, ensuring most readers will stop at the dramatic headline. The present article has the objective to counterweight those "high stakes" claims.

First, there is a problem of methodology. The author left a brief note on the LZ4 issue board, and then, without even listening the detailed explanations you'll find below nor engaging into mitigation activity, went alone towards advertising it to the widest possible audience just a few days later, creating exposition risks for the users. This is far short of an expected professional behavior from a security firm.
[Edit] : Later comments from DonB state that this situation is due to him not receiving notifications about answers being provided for him on the board.

Second, the author claims to have found this subtle risk through careful code study on its own. This is not true. The risk was identified by Ludwig Strigeus, the creator of µTorrent, quite some time ago. Ludwig made a very fine job at describing the risk. Instead of trying to make a headline, he proposed a solution for it. After multiple partial fixes, the risk was finally plugged recently (just a few days before DonB second "disclosure"). Why so much time you would ask ?
Well, because there was no real-world risk.

Let's now get into the technical details.
In order to use the vulnerability, a number of conditions must be met.
A first minor one is : it is necessary to work within a 32-bits environment. 64-bits is totally unaffected. That basically put most server-side applications outside of the risk zone.

Second, the attacker need to forge a special compressed block to overflow 32-bits address space. This is possible ... if the compressed block is something like 16 MB.

There is just a little problem with that : the legacy LZ4 file format is limited to 8 MB blocks. Any value larger than that just stops the decoding process. 8MB is not enough to trigger a problem. The newer streaming format is even stricter, with a hard limit at 4 MB. As a consequence, it's not possible to exploit that vulnerability using the documented LZ4 file/streaming format.

Well, you say, but what about programs which use their own, undocumented, format ?
Indeed, same condition applies. To be exposed to this risk, very large blocks of 16MB or larger must be read by the decoder.
Does that happen ?
Let's have a look at several high-profile programs using LZ4. ZFS ? Max block size is 128 KB. Lucene ? Typical index size is 16 KB. It could be a bit more, say 64 KB, that's still far short of the objective. zram maybe ? nope, 4 KB memory segments. Linux kernel ? The boot section has to decompress a multi-megabytes kernel into memory, surely it can match the limit ? Yes, it does, but it uses the LZ4 Legacy File format, which limits each block to 8 MB maximum. OK, maybe AAA games such as Guild Wars 2 ? nope, real-time protocol is the realm of short messages, the protocol itself doesn't allow anything beyond a few KB. And so on, and on.

At the end of the day, none of the known implementation of LZ4 is exposed to this risk.
Basically, most user programs employ LZ4 for small data packet structure, way below the critical limit. Programs which generate and distribute large compressed blocks (notably the lz4c pos-x compression utility, distributed within Linux Distro) use the documented streaming format, which limits block size to 4 or 8 MB. Remove also from the list programs which never take "externally provided" data as input, they can't be targeted either.

So sorry, this is not a "new heartbleed" situation the author seems to dream for.

Nevertheless, it's a good move to close this risk, just in case, in the future, one implementation may inadvertently wander into the area of "custom compression format using large blocks of > 8 MB on 32-bits system, and receiving data from untrusted external sources". Granted, this scenario stands in the low probability range. But that's nonetheless good to plug it. Finding a solution without undesirable side-effects took some time though, but that's finally the case within current LZ4 release available on Github and Google code.

It's one thing to tell there is a potential vulnerability that should be fixed, to ensure it does not become exploitable in the future. It's a totally different thing to pretend there is a dangerous RCE (Remote Code Execution) exploit currently active on the Internet, which is scary. DonB article cleverly conflates the two, implying the second to create a flashy headline, while disseminating some minor disclaimer elements throughout the long article to pretend, whenever necessary, having said the first. That's clever, but conflating gravity level to grab some free ad is not a respectful behavior from a security firm.

I'm also bitter at the bug finding misappropriation. The real identifier is Ludwig Strigeus, let's make that clear. The long list of "credits" at the end of DonB article is another reason for caution : it happens I asked some of the influential names listed there, and they told me they barely heard about the guy. Fame by association ? Sure, please thank Linus Torvald for "coordinating and advising" on the issue.

Finally, I'm also angry because security matters, a lot. Triggering too many alarms to grab a bit of fame is a good way to weaken the power of future alarms. You can guess that when every headline claim a "new heartbleed" situation, no one will pay attention to the real next one which will matter. And that's dangerous.

Anyway, should you feel nonetheless at risk now, please don't hesitate to update your LZ4 version. It's a good thing to do anyway, and as stated previously, the vulnerability was already patched.


  1. I did a LZW implementation for Ticketmaster in 1992. I got the algorithm from a magazine pseudocode. I used the same algorithm in TempleOS. You have no idea what you are talking about.

    1. Care to elaborate? What points has the article gotten wrong?

    2. Just a heads up: the guy you are responding to is quite (in)famous on Reddit. He's a really smart guy but he sadly has schizophrenia and thanks to the American medical system being a joke he has no access to proper treatment. You might want to take his response with a pinch of salt.

  2. Unfortunately, the poster above, Terry Davis, suffers from what many people believe to be schizophrenia. He is a person of very high intelligence, but often says very inappropriate things. :(

    1. Yeah, I guess it happens, this is Internet after all. ;-) No worry

    2. And you are one of those guys with compulsive-obsessive disorder that always, ALWAYS have to point out Davis' condition whenever his name is written somewhere.


    1. For the record: this comment was meant as a reply to Terry A Davis

      The article makes perfect sense :)

  4. Thank you very much for making that clear !

  5. Nice analysis. Thank you!

  6. but but... it's writing to memory here:


    1. The example linked to seems to confuse "kernel space" from "user space". It's not possible to simply call a "kernel function", as he does within his example by copy/pasting the code into his sample. Kernel functions can only be called from within the kernel.

      Besides, his example use the unsafe version "LZ4_decompress_fast" (named lz4_uncompress within Linux kernel), which is, by definition, unprotected from malicious input. The protected version is "LZ4_decompress_safe" (named lz4_uncompress_unknownoutputsize within Linux Kernel)

      LZ4 kernel version is currently used by the BootLoader, zRam, SquashFS and BTRFS. None of them is providing the decoder with data blocks large enough to risk being exposed. That's why current Linux kernels are safe.

      Now, it could that, in the future, a yet unknown program within Linux Kernel may do just that, use blocks of 16MB and beyond. It's unlikely but it *could* happen. That's why the Kernel version must be patched (and is being patched, I'm in contact with Greg for that).

      But there is no risk *righ now*. No external program can exploit it using any of the entry points offered by the kernel so far.

    2. Well, there are reports telling libav/ffmpeg could be at risk. They both deal with ton of custom data formats and can use blocks over 16M.

      Basically you never know how people would use your code. So leaving landmines like this in lib is really wrong idea, especially if you're aware of them. And speaking for myself, I think "unprotected"/non-validated functions like LZ4_decompress_fast just should not exist on kernel side. This gives user mode heck a lot chances to try to exploit kernel and elevate rights.

    3. Well, here is your comment, dug back from the grave.

      As far as I know, libav/ffmpeg don't use LZ4, so I can't comment. Furthermore, they seem to use a "custom version" of LZO, so they are basically managing their own risk, which is also limited to their application.

      Regarding LZ4_decompress, the reference implementation clearly promotes the _safe() variant, which is at the top of the interface, while the _fast() variant must be found later on, within the "advanced section".

      It's also my recommendation that LZ4_decompress_safe() should be used under most circumstances.
      There are programmers though which are willing to use the _fast variant, because they are working in a closed environment, with no external interaction. It's typically the case when the program compresses and decompresses its own data, directly from memory (without intermediate storage).

    4. Still, it is better save than sorry, so it would be nice if such generic and widespread things like LZ4 would not cause unexpected surprises, even in "uncommon" scenarios. I think I should agree that fortunately most usecases do not suffer from bug, so it wouldn't go as bad as it could be. However, there could be usecases which could be affected and it is really good idea to update LZ4 and LZO libs before blackhats find them and start exploiting it to gain unauthorised access, etc.

      As for _fast, got it. Though from my experience, "closed" environment usually happens to be not "as closed as expected", leading to funny attack vectors. But I would agree on that. And after re-checking Linux kernel, I can tell they were smartass enough not to use _fast in their hazardous area.

      P.S. and please accept my apologies for over-reacting on comment removal. Since it turned out to be spam protection, it has been really wrong idea to aggravate on post removal.

    5. While closed environments are often not as closed as expected, C provides much easier ways of shooting yourself in the foot than explicitly choosing an unsafe decompressor.

    6. C programmers are usually aware of common pitfalls. And searching for unknown bug in unknown program isn't very easy. OTOH decompressor already does half of job attacker needs (accesses memory under control of external data in predictable ways). So attempt to try to fool decompressor looks like promising idea if you're about to break into "closed" system. But I would agree its not compression lib author fault if someone uses unsafe but faster algos and then fails to ensure data were valid.

  7. Interesting. LZ4 author seems to be coward and deletes posts admitting that libav/ffmpeg seems to be affected by this vulnerability due to custom formats. Should I tell it is irresponsible attitude to downplay security issues? And it is also quite irresponsible to fix security problems one year after getting aware of them.

    1. Apparently, a commenter is courageous enough to send a libellous comment in Anonymous mode

      I don't "hide" posts, they are all displayed here automatically, without my approval. Now if you make a habit to send comment in anonymous mode, it may happen sometimes that one comment would automatically be stored into "spam" folder for reasons outside of my control. It will then require me to make a bit of research to find it and promote it.

    2. Hmm, if we're about "courage", its simple: I just do not have any of mentioned "profiles". Hence, posting as anonymous user has been just most convenient and fast option. And if it has been anti-spam measures what removed my comment, then I'm really sorry for being too harsh on that.