[LLVMdev] Error handling in LLVMObject library

David Blaikie dblaikie at gmail.com
Mon Jun 1 10:51:39 PDT 2015


On Mon, Jun 1, 2015 at 10:44 AM, Alexey Samsonov <vonosmas at gmail.com> wrote:

> OK, looks like there's a strong consensus that ErrorOr<> is the way to go.
> I'd probably
> abstain from doing this in one huge change, as it would be too large to
> digest and review,
> and not totally mechanical. I will probably proceed with smaller changes.
> accompanied with
> test cases generated by fuzzer.
>
> On Sun, May 31, 2015 at 5:01 PM, Rafael EspĂ­ndola <
> rafael.espindola at gmail.com> wrote:
>
>> On 29 May 2015 at 19:06, Alexey Samsonov <vonosmas at gmail.com> wrote:
>> > Hi everyone,
>> >
>> > Having proper error handling in LLVM's Object parsing library is a nice
>> > thing by itself, but it would additionally allow us to find bugs by
>> fuzzing
>> > (see r238451 that adds llvm-dwarfdump-fuzzer tool), for which the clean
>> > input validation is essential.
>> >
>> > This is a generic discussion of state of affairs. I want to do some
>> progress
>> > in fuzzing before we finish it (especially if we decide to make a
>> > significant intrusive changes), you may scroll down for my plan.
>> >
>> > The code in lib/Object calls report_fatal_error() far too often, both
>> when
>> > we're (a) just constructing the specific implementation of ObjectFile,
>> and
>> > (b) when we access its contents and find out the file is broken and
>> can't be
>> > parsed properly.
>> >
>> > We should just go and fix (a): ObjectFile factory methods return
>> > ErrorOr<std::unique_ptr<ObjectFile>>, and we should propagate the error
>> > appropriately.
>> >
>> > (b) is harder. The current approach is to use std::error_code as a
>> return
>> > type, and store the result in by-reference argument, for instance:
>> >   std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);
>> >
>> > I wanted to follow this approach in a proposed large MachO API change
>> > (http://reviews.llvm.org/D10111), but it raised discussion on whether
>> this
>> > approach is right.
>> > Moving this discussion here. I see the following options:
>> >
>> > 1. Use the current approach:
>> >   std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);
>> >
>> > Pros:
>> >   * don't need to change a large number of (often virtual) API functions
>> >   * has a nice error handling pattern used in LLVM tools:
>> >   uint64_t Addr;
>> >   if (error(getSymbolAddress(Symbol, Addr)))
>> >     return;  // or continue, or do anything else.
>> >
>> > Cons:
>> >   * return value can just be silently ignored. Adding warn_unused_result
>> > attribute on per-function basis is ugly
>> >   * adds extra overhead for cases when we're certain the result would be
>> > valid.
>>
>> I agree, this is really bad.
>>
>> > 2. Switch to ErrorOr wrapper:
>> >   ErrorOr<uint64_t> getSymbolAddress(DataRefImpl Symbol);
>> >
>> > Pros:
>> >   * handling the error is now mandatory and explicit.
>> >   * callers can explicitly skip error handling if they know the result
>> would
>> > be valid:
>> >     uint64_t Addr = getSymbolAddress(Symbol).get();
>> >   and it would fail the assert if they are wrong.
>> >
>> > Cons:
>> >   * need to change lots of old code, or live with two versions of
>> functions
>> >   * error handling boilerplate in regular code on call site is ugly:
>> >   auto AddrOrErr = getSymbolAddress(Symbol);
>> >   if (AddrOrErr.hasError())
>> >     return;  // or continue, or whatever
>> >   uint64_t Addr = AddrOrErr.get();
>> >   (can probably be improved with a macro)
>> >   * adds extra overhead for cases when we're certain the result would be
>> > valid.
>>
>> I think this is a strict improvement, and would not object to doing it
>> in one big step if you wanted.
>>
>> One further improvement is that it looks like the API was written
>> without thinking much about what is an error and what functions can
>> fail.
>>
>> For the above function, the ErrorOr style is the best we can do since
>> to get the address of a symbol on a ELF .o file we have to find the
>> section and that index can be wrong (not that we actually report that
>> at the moment :-( ).
>>
>> But for alignment we can use just use "uint32_t
>> getSymbolAlignment(DataRefImpl Symb) const" (see r238700).
>>
>> Going further, it might even be better to replace getSymbolAddress
>> with a getSymbolValue that cannot fail and have the caller handle the
>> fact that the value is sometimes a virtual address and sometimes
>> section relative (it is likely more efficient too).
>>
>>
>> > On IRC discussion Lang suggested
>> > 3. Check the whole object file contents in constructor or validate()
>> method,
>> > and get rid
>> > of all error codes in regular accessors.
>> >
>> > Pros:
>> >   * the interface is much cleaner
>> >   * no extra overhead for trusted (e.g. JIT) object files.
>> >
>> > Cons:
>> >   * significant change, fundamentally shifting the way object files are
>> > handled
>> >   * verifier function should now absolutely everything about the object
>> > file, and anticipate all possible use cases. Especially hard, assuming
>> that
>> > ObjectFile interface allows user to pass any garbage as input arguments
>> > (e.g. as DataRefImpl in the example above).
>> >   * verifier can be slow, and might be an overkill if we strongly
>> desire to
>> > parse some bits of object file lazily.
>>
>>
>> Please don't do this. At the library level we cannot know what parts
>> of the file a client would actually use.
>
>
>>
>>
>> > ================
>> >
>> > Instead of http://reviews.llvm.org/D10111, I'm going to proceed with
>> minimal
>> > incremental changes, that would allow fuzzer to move forward. Namely, I
>> want
>> > to keep the changes to headers as small as possible, changing functions
>> one
>> > by one, and preferring to use ErrorOr<> construct (option 2 listed
>> above).
>> > An additional benefit of this is that each small incremental change
>> would be
>> > accompanied by the test case generated by fuzzer, that exposed this
>> problem.
>> >
>> > Let me know if you think it's a good or terrible idea.
>>
>> IMHO the most important thing is to make sure that every error path
>> you add is tested. With tests we can refactor. For example, see the
>> refactoring in (r238024).
>>
>
> This revision uses report_fatal_error() instead of returning the error
> code. The only benefit of this
> is providing a reasonable error message, but this limits the ability to
> use LLVMObject as a library
> for parsing/querying invalid object files.
>
> Can we sacrifice readable error messages if we doesn't want the library to
> crash? Or we should
> add extra object_error enum for every type of error?
>

It seems reasonable to add an enumerator to the error category enumeration
for any specific error path that's useful as a specific diagnostic. At
least cases where we have observably different diagnostics already with
test cases could be maintained - for other things we could be vague until
we have a particular need to be less so...

(but I don't work with this stuff, so owners/users may have more informed
opinions on the matter than I do)

- David


>
>
>>
>> Cheers,
>> Rafael
>>
>
>
>
> --
> Alexey Samsonov
> vonosmas at gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150601/5fe101eb/attachment.html>


More information about the llvm-dev mailing list