[LLVMdev] Error handling in LLVMObject library

Mon Jun 1 18:16:51 PDT 2015

With comdats parts of the file might never be read.

There are also multiple levels of cache and while all the relocations of a
file will fit in ram, it is probably still more efficient to validate them
as they are read.

But I think (typing on a phone) that relocations are another case where the
best is to have a more specific api: the only way the relocation is invalid
from the perspective of the api we provide is if the symbol index is
invalid. We can just return symbol_end for that and avoid the error_code
and the ErrorOr.

Cheers,
Rafael
On Jun 1, 2015 6:19 PM, "Lang Hames" <lhames at gmail.com> wrote:

> Out of interest, what parts of an object file commonly don't get mapped in
> by the linker? I'd have assumed linkers would usually end up touching just
> about everything.
>
> Even assuming that whole file validation is too coarse for performance
> reasons, I'd prefer something coarser than what we currently have. For
> example, right now every access to a relocation has to deal with an error
> return, but I assume it's rare to parse a single relocation in isolation.
> Why don't we validate all relocations for each section up front, then have
> a simpler API for accessing them?
>
> I don't have especially strong feelings about this, just a nagging sense
> that libObject's error handling is unnecessarily fine grained.
>
> FWIW, of the other two proposals I prefer the ErrorOr approach.
>
> Cheers,
> Lang.
>
> On Fri, May 29, 2015 at 6:45 PM, Sean Silva <chisophugis at gmail.com> wrote:
>
>>
>>
>> On Fri, May 29, 2015 at 4:06 PM, Alexey Samsonov <vonosmas at gmail.com>
>> wrote:
>>
>>> Hi everyone,
>>>
>>> Having proper error handling in LLVM's Object parsing library is a nice
>>> thing by itself, but it would additionally allow us to find bugs by fuzzing
>>> (see r238451 that adds llvm-dwarfdump-fuzzer tool), for which the clean
>>> input validation is essential.
>>>
>>> This is a generic discussion of state of affairs. I want to do some
>>> progress in fuzzing before we finish it (especially if we decide to make a
>>> significant intrusive changes), you may scroll down for my plan.
>>>
>>> The code in lib/Object calls report_fatal_error() far too often, both
>>> when we're (a) just constructing the specific implementation of ObjectFile,
>>> and (b) when we access its contents and find out the file is broken and
>>> can't be parsed properly.
>>>
>>> We should just go and fix (a): ObjectFile factory methods return
>>> ErrorOr<std::unique_ptr<ObjectFile>>, and we should propagate the error
>>> appropriately.
>>>
>>> (b) is harder. The current approach is to use std::error_code as a
>>> return type, and store the result in by-reference argument, for instance:
>>>   std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);
>>>
>>> I wanted to follow this approach in a proposed large MachO API change
>>> (http://reviews.llvm.org/D10111
>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D10111&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=t8fOzjcE-HffAHdlkx8x2RzDYXmOrfwTVRZMBY_I1-k&s=vIEUEH66ZENYUbm5_hrLkGBXJih9kgG91w9SNPrpGGU&e=>),
>>> but it raised discussion on whether this approach is right.
>>> Moving this discussion here. I see the following options:
>>>
>>> 1. Use the current approach:
>>>   std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);
>>>
>>> Pros:
>>>   * don't need to change a large number of (often virtual) API functions
>>>   * has a nice error handling pattern used in LLVM tools:
>>>   uint64_t Addr;
>>>   if (error(getSymbolAddress(Symbol, Addr)))
>>>     return;  // or continue, or do anything else.
>>>
>>> Cons:
>>>   * return value can just be silently ignored. Adding warn_unused_result
>>> attribute on per-function basis is ugly
>>>   * adds extra overhead for cases when we're certain the result would be
>>> valid.
>>>
>>> 2. Switch to ErrorOr wrapper:
>>>   ErrorOr<uint64_t> getSymbolAddress(DataRefImpl Symbol);
>>>
>>> Pros:
>>>   * handling the error is now mandatory and explicit.
>>>   * callers can explicitly skip error handling if they know the result
>>> would be valid:
>>>     uint64_t Addr = getSymbolAddress(Symbol).get();
>>>   and it would fail the assert if they are wrong.
>>>
>>> Cons:
>>>   * need to change lots of old code, or live with two versions of
>>> functions
>>>   * error handling boilerplate in regular code on call site is ugly:
>>>   auto AddrOrErr = getSymbolAddress(Symbol);
>>>   if (AddrOrErr.hasError())
>>>     return;  // or continue, or whatever
>>>   uint64_t Addr = AddrOrErr.get();
>>>   (can probably be improved with a macro)
>>>   * adds extra overhead for cases when we're certain the result would be
>>> valid.
>>>
>>> On IRC discussion Lang suggested
>>> 3. Check the whole object file contents in constructor or validate()
>>> method, and get rid
>>> of all error codes in regular accessors.
>>>
>>
>> Unfortunately this option isn't possible with the current libObject.
>> libObject is specifically designed to avoid paging in or allocating any
>> memory beyond simply mmap'ing the file. This makes it annoying to use, but
>> also allows it to be used for demanding use cases like a linker. For many
>> things, I've always wished we had a "libEasyObject" that avoided this
>> annoyance, but it seems like a lot of work compared to the benefit.
>>
>> -- Sean Silva
>>
>>
>>>
>>> Pros:
>>>   * the interface is much cleaner
>>>   * no extra overhead for trusted (e.g. JIT) object files.
>>>
>>> Cons:
>>>   * significant change, fundamentally shifting the way object files are
>>> handled
>>>   * verifier function should now absolutely everything about the object
>>> file, and anticipate all possible use cases. Especially hard, assuming that
>>> ObjectFile interface allows user to pass any garbage as input arguments
>>> (e.g. as DataRefImpl in the example above).
>>>   * verifier can be slow, and might be an overkill if we strongly desire
>>> to parse some bits of object file lazily.
>>>
>>> ================
>>>
>>> Instead of http://reviews.llvm.org/D10111
>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D10111&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=t8fOzjcE-HffAHdlkx8x2RzDYXmOrfwTVRZMBY_I1-k&s=vIEUEH66ZENYUbm5_hrLkGBXJih9kgG91w9SNPrpGGU&e=>,
>>> I'm going to proceed with minimal incremental changes, that would allow
>>> fuzzer to move forward. Namely, I want to keep the changes to headers as
>>> small as possible, changing functions one by one, and preferring to use
>>> ErrorOr<> construct (option 2 listed above). An additional benefit of this
>>> is that each small incremental change would be accompanied by the test case
>>> generated by fuzzer, that exposed this problem.
>>>
>>> Let me know if you think it's a good or terrible idea.
>>>
>>> --
>>> Alexey Samsonov
>>> vonosmas at gmail.com
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150601/dd40fac9/attachment.html>