[llvm-dev] Overlapping memcpy

Mon Dec 7 10:19:50 PST 2015

> On Dec 7, 2015, at 12:30 PM, David Chisnall via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> On 7 Dec 2015, at 16:39, Joerg Sonnenberger via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>> 
>> On Mon, Dec 07, 2015 at 07:33:29AM +0100, Maciej Adamczyk via llvm-dev wrote:
>>> Hello.
>>> My friend's data compressor has a problem. During decompression stage, on
>>> some corrupted files, it may issue an overlapping memcpy.
>>> He has two easy solutions for that:
>>> * switch to memmove
>>> * add a branch to detect such case
>>> However, he's not happy with either of them as they slow the decompression
>>> down to handle a case that will never happen to almost everyone.
>> 
>> While I don't think any of this is really LLVM specific, the second is
>> certainly the correct approach if the file format explicitly disallows
>> such overlapping ranges. LZMA streams for example are perfectly well
>> defined for that case and it even make sense for certain overlapping
>> pattern to say "copy 256 Bytes starting from offsets -8". If you
>> know that this condition is invalid for well formed input, mark the
>> condition as predicted false and the compiler will try to turn it into a branch
>> statement that the branch prediction of the CPU can understand.
> 
> [ continuing off topic ] The lack of such error checking is one of the big reasons that libraries like libjpeg, libpng, and so on have been a huge source of vulnerabilities in web browsers for the last couple of decades.  It sounds like your friend has already added a security hole to his library, please discourage him from adding any more.

+1.  Why would one care about the performance of decompressing a corrupted stream? As soon as a stream is identified as corrupted, you want to bail out of decompression without writing additional data. Otherwise you are almost surely introducing a security vulnerability.