[llvm] r263742 - Bitcode: Error out instead of crashing on corrupt metadata

Thu Mar 17 15:58:50 PDT 2016

Rafael Espíndola <rafael.espindola at gmail.com> writes:
> On 17 March 2016 at 13:12, Justin Bogner via llvm-commits
> <llvm-commits at lists.llvm.org> wrote:
>> Author: bogner
>> Date: Thu Mar 17 15:12:06 2016
>> New Revision: 263742
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=263742&view=rev
>> Log:
>> Bitcode: Error out instead of crashing on corrupt metadata
>>
>> I hit a crash in the bitcode reader on some corrupt input where an
>> MDString had somehow been attached to an instruction instead of an
>> MDNode. This input is pretty bogus, but we shouldn't be crashing on bad
>> input here.
>>
>> This change adds error handling in all of the places where we
>> currently have unchecked casts from Metadata to MDNode, which means
>> we'll error out instead of crashing for that sort of input.
>>
>> Unfortunately, I don't have tests. Hitting this requires flipping bits
>> in the input bitcode, and committing corrupt binary files to catch
>> these cases is a bit too opaque and unmaintainable.
>
> IMHO not having tests is even more unmaintainable.

I could commit an 800 byte binary blob where I generated some IR then
manually flipped a bit in a hex editor to hit one of the five error
checks I added here. I could conceivably figure out how to generate
similar tests for the other four checks with enough time and effort. Of
course, since the bitcode's invalid it won't be meaningfully upgradeable
and will probably end up hitting some other error path very soon after
and no longer test this at all. When it does start erroring for whatever
reason someone will find that it isn't doing anything interesting and
probably delete it. IMO, This kind of test is useless.

The case where I hit this in the real world involved a truncated bitcode
file from an out of tree front end. This also wouldn't be terribly
useful to commit.

I think if we want to test this kind of change meaningfully we probably
need to expose the BitcodeReader type to unit tests, then call
individual parse functions with blobs of data that are short enough for
a human to understand. This could end up being messy anyway though, I'm
not really sure.