[LLVMdev] Error handling in LLVMObject library

Rafael EspĂ­ndola rafael.espindola at gmail.com
Mon Jun 1 18:50:11 PDT 2015


Just from the description I think you can report an error the first time
and assert the following ones.

Cheers,
Rafael
On Jun 1, 2015 9:42 PM, "Rui Ueyama" <ruiu at google.com> wrote:

> There are some occasions in which you read the same data many times from
> the same object file. In the new COFF linker, we read relocations tables
> twice  -- one to eliminate unreferenced comdat sections and the other to
> apply relocations. I'm planning to do comdat merging by contents,  and I'm
> going to read relocation tables many more times to see if relocations are
> also the same. After comdat dead-stripping, I'm sure there's no error in
> the relocation tables.
>
> Am I expected to use libObject like that? Or do you recommend caching
> results?
>
> On Mon, Jun 1, 2015 at 6:16 PM, Rafael EspĂ­ndola <
> rafael.espindola at gmail.com> wrote:
>
>> With comdats parts of the file might never be read.
>>
>> There are also multiple levels of cache and while all the relocations of
>> a file will fit in ram, it is probably still more efficient to validate
>> them as they are read.
>>
>> But I think (typing on a phone) that relocations are another case where
>> the best is to have a more specific api: the only way the relocation is
>> invalid from the perspective of the api we provide is if the symbol index
>> is invalid. We can just return symbol_end for that and avoid the error_code
>> and the ErrorOr.
>>
>> Cheers,
>> Rafael
>> On Jun 1, 2015 6:19 PM, "Lang Hames" <lhames at gmail.com> wrote:
>>
>>> Out of interest, what parts of an object file commonly don't get mapped
>>> in by the linker? I'd have assumed linkers would usually end up touching
>>> just about everything.
>>>
>>> Even assuming that whole file validation is too coarse for performance
>>> reasons, I'd prefer something coarser than what we currently have. For
>>> example, right now every access to a relocation has to deal with an error
>>> return, but I assume it's rare to parse a single relocation in isolation.
>>> Why don't we validate all relocations for each section up front, then have
>>> a simpler API for accessing them?
>>>
>>> I don't have especially strong feelings about this, just a nagging sense
>>> that libObject's error handling is unnecessarily fine grained.
>>>
>>> FWIW, of the other two proposals I prefer the ErrorOr approach.
>>>
>>> Cheers,
>>> Lang.
>>>
>>> On Fri, May 29, 2015 at 6:45 PM, Sean Silva <chisophugis at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Fri, May 29, 2015 at 4:06 PM, Alexey Samsonov <vonosmas at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> Having proper error handling in LLVM's Object parsing library is a
>>>>> nice thing by itself, but it would additionally allow us to find bugs by
>>>>> fuzzing (see r238451 that adds llvm-dwarfdump-fuzzer tool), for which the
>>>>> clean input validation is essential.
>>>>>
>>>>> This is a generic discussion of state of affairs. I want to do some
>>>>> progress in fuzzing before we finish it (especially if we decide to make a
>>>>> significant intrusive changes), you may scroll down for my plan.
>>>>>
>>>>> The code in lib/Object calls report_fatal_error() far too often, both
>>>>> when we're (a) just constructing the specific implementation of ObjectFile,
>>>>> and (b) when we access its contents and find out the file is broken and
>>>>> can't be parsed properly.
>>>>>
>>>>> We should just go and fix (a): ObjectFile factory methods return
>>>>> ErrorOr<std::unique_ptr<ObjectFile>>, and we should propagate the error
>>>>> appropriately.
>>>>>
>>>>> (b) is harder. The current approach is to use std::error_code as a
>>>>> return type, and store the result in by-reference argument, for instance:
>>>>>   std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);
>>>>>
>>>>> I wanted to follow this approach in a proposed large MachO API change
>>>>> (http://reviews.llvm.org/D10111
>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D10111&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=t8fOzjcE-HffAHdlkx8x2RzDYXmOrfwTVRZMBY_I1-k&s=vIEUEH66ZENYUbm5_hrLkGBXJih9kgG91w9SNPrpGGU&e=>),
>>>>> but it raised discussion on whether this approach is right.
>>>>> Moving this discussion here. I see the following options:
>>>>>
>>>>> 1. Use the current approach:
>>>>>   std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);
>>>>>
>>>>> Pros:
>>>>>   * don't need to change a large number of (often virtual) API
>>>>> functions
>>>>>   * has a nice error handling pattern used in LLVM tools:
>>>>>   uint64_t Addr;
>>>>>   if (error(getSymbolAddress(Symbol, Addr)))
>>>>>     return;  // or continue, or do anything else.
>>>>>
>>>>> Cons:
>>>>>   * return value can just be silently ignored. Adding
>>>>> warn_unused_result attribute on per-function basis is ugly
>>>>>   * adds extra overhead for cases when we're certain the result would
>>>>> be valid.
>>>>>
>>>>> 2. Switch to ErrorOr wrapper:
>>>>>   ErrorOr<uint64_t> getSymbolAddress(DataRefImpl Symbol);
>>>>>
>>>>> Pros:
>>>>>   * handling the error is now mandatory and explicit.
>>>>>   * callers can explicitly skip error handling if they know the result
>>>>> would be valid:
>>>>>     uint64_t Addr = getSymbolAddress(Symbol).get();
>>>>>   and it would fail the assert if they are wrong.
>>>>>
>>>>> Cons:
>>>>>   * need to change lots of old code, or live with two versions of
>>>>> functions
>>>>>   * error handling boilerplate in regular code on call site is ugly:
>>>>>   auto AddrOrErr = getSymbolAddress(Symbol);
>>>>>   if (AddrOrErr.hasError())
>>>>>     return;  // or continue, or whatever
>>>>>   uint64_t Addr = AddrOrErr.get();
>>>>>   (can probably be improved with a macro)
>>>>>   * adds extra overhead for cases when we're certain the result would
>>>>> be valid.
>>>>>
>>>>> On IRC discussion Lang suggested
>>>>> 3. Check the whole object file contents in constructor or validate()
>>>>> method, and get rid
>>>>> of all error codes in regular accessors.
>>>>>
>>>>
>>>> Unfortunately this option isn't possible with the current libObject.
>>>> libObject is specifically designed to avoid paging in or allocating any
>>>> memory beyond simply mmap'ing the file. This makes it annoying to use, but
>>>> also allows it to be used for demanding use cases like a linker. For many
>>>> things, I've always wished we had a "libEasyObject" that avoided this
>>>> annoyance, but it seems like a lot of work compared to the benefit.
>>>>
>>>> -- Sean Silva
>>>>
>>>>
>>>>>
>>>>> Pros:
>>>>>   * the interface is much cleaner
>>>>>   * no extra overhead for trusted (e.g. JIT) object files.
>>>>>
>>>>> Cons:
>>>>>   * significant change, fundamentally shifting the way object files
>>>>> are handled
>>>>>   * verifier function should now absolutely everything about the
>>>>> object file, and anticipate all possible use cases. Especially hard,
>>>>> assuming that ObjectFile interface allows user to pass any garbage as input
>>>>> arguments (e.g. as DataRefImpl in the example above).
>>>>>   * verifier can be slow, and might be an overkill if we strongly
>>>>> desire to parse some bits of object file lazily.
>>>>>
>>>>> ================
>>>>>
>>>>> Instead of http://reviews.llvm.org/D10111
>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D10111&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=t8fOzjcE-HffAHdlkx8x2RzDYXmOrfwTVRZMBY_I1-k&s=vIEUEH66ZENYUbm5_hrLkGBXJih9kgG91w9SNPrpGGU&e=>,
>>>>> I'm going to proceed with minimal incremental changes, that would allow
>>>>> fuzzer to move forward. Namely, I want to keep the changes to headers as
>>>>> small as possible, changing functions one by one, and preferring to use
>>>>> ErrorOr<> construct (option 2 listed above). An additional benefit of this
>>>>> is that each small incremental change would be accompanied by the test case
>>>>> generated by fuzzer, that exposed this problem.
>>>>>
>>>>> Let me know if you think it's a good or terrible idea.
>>>>>
>>>>> --
>>>>> Alexey Samsonov
>>>>> vonosmas at gmail.com
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>
>>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150601/40dd1a84/attachment.html>


More information about the llvm-dev mailing list