[PATCH] Bitcode: Error out instead of crashing on corrupt metadata

Thu Mar 17 11:31:03 PDT 2016

Filipe Cabecinhas <me at filcab.net> writes:
> The best option is to get a .bc file with the elements you need (but a
> correct file).
> What tends to happen is that you get abbreviations for most things. Then
> you pause the reader on some line a bit before reading the thing you want.
> Then look at the abbrev table and the abbrev you're reading, make sure they
> match and note the offset in bits.
> Then it's time to hex edit the file to change the abbrev at the point you
> want (the instruction's MDNode) to instead point to the MDString.
>
> Let me know if you need further help. I'll try to reproduce that problem,
> but probably only tomorrow, if I can.

So this works, but I end up with fairly large binary files and basically
need one per call-site I've fixed here for proper coverage. This kind of
testing doesn't feel all that maintainable.

>   Filipe
> On Tuesday, 15 March 2016, Justin Bogner via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>
>> I hit a crash in the bitcode reader on some corrupt input where an
>> MDString had somehow been attached to an instruction instead of an
>> MDNode. This input is pretty bogus, but we shouldn't be crashing on bad
>> input here.
>>
>> The attached patch adds error handling in all of the places where we
>> currently have unchecked casts from Metadata to MDNode, which means
>> we'll error out instead of crashing for that sort of input.
>>
>> Unfortunately, I can't figure out a way to write a test to hit these
>> corner cases. Tips on generating bogus bitcode would be welcome.
>>
>>