[LLVMdev] Error handling in LLVMObject library

Alexey Samsonov vonosmas at gmail.com
Mon Jun 1 10:44:53 PDT 2015


OK, looks like there's a strong consensus that ErrorOr<> is the way to go.
I'd probably
abstain from doing this in one huge change, as it would be too large to
digest and review,
and not totally mechanical. I will probably proceed with smaller changes.
accompanied with
test cases generated by fuzzer.

On Sun, May 31, 2015 at 5:01 PM, Rafael EspĂ­ndola <
rafael.espindola at gmail.com> wrote:

> On 29 May 2015 at 19:06, Alexey Samsonov <vonosmas at gmail.com> wrote:
> > Hi everyone,
> >
> > Having proper error handling in LLVM's Object parsing library is a nice
> > thing by itself, but it would additionally allow us to find bugs by
> fuzzing
> > (see r238451 that adds llvm-dwarfdump-fuzzer tool), for which the clean
> > input validation is essential.
> >
> > This is a generic discussion of state of affairs. I want to do some
> progress
> > in fuzzing before we finish it (especially if we decide to make a
> > significant intrusive changes), you may scroll down for my plan.
> >
> > The code in lib/Object calls report_fatal_error() far too often, both
> when
> > we're (a) just constructing the specific implementation of ObjectFile,
> and
> > (b) when we access its contents and find out the file is broken and
> can't be
> > parsed properly.
> >
> > We should just go and fix (a): ObjectFile factory methods return
> > ErrorOr<std::unique_ptr<ObjectFile>>, and we should propagate the error
> > appropriately.
> >
> > (b) is harder. The current approach is to use std::error_code as a return
> > type, and store the result in by-reference argument, for instance:
> >   std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);
> >
> > I wanted to follow this approach in a proposed large MachO API change
> > (http://reviews.llvm.org/D10111), but it raised discussion on whether
> this
> > approach is right.
> > Moving this discussion here. I see the following options:
> >
> > 1. Use the current approach:
> >   std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);
> >
> > Pros:
> >   * don't need to change a large number of (often virtual) API functions
> >   * has a nice error handling pattern used in LLVM tools:
> >   uint64_t Addr;
> >   if (error(getSymbolAddress(Symbol, Addr)))
> >     return;  // or continue, or do anything else.
> >
> > Cons:
> >   * return value can just be silently ignored. Adding warn_unused_result
> > attribute on per-function basis is ugly
> >   * adds extra overhead for cases when we're certain the result would be
> > valid.
>
> I agree, this is really bad.
>
> > 2. Switch to ErrorOr wrapper:
> >   ErrorOr<uint64_t> getSymbolAddress(DataRefImpl Symbol);
> >
> > Pros:
> >   * handling the error is now mandatory and explicit.
> >   * callers can explicitly skip error handling if they know the result
> would
> > be valid:
> >     uint64_t Addr = getSymbolAddress(Symbol).get();
> >   and it would fail the assert if they are wrong.
> >
> > Cons:
> >   * need to change lots of old code, or live with two versions of
> functions
> >   * error handling boilerplate in regular code on call site is ugly:
> >   auto AddrOrErr = getSymbolAddress(Symbol);
> >   if (AddrOrErr.hasError())
> >     return;  // or continue, or whatever
> >   uint64_t Addr = AddrOrErr.get();
> >   (can probably be improved with a macro)
> >   * adds extra overhead for cases when we're certain the result would be
> > valid.
>
> I think this is a strict improvement, and would not object to doing it
> in one big step if you wanted.
>
> One further improvement is that it looks like the API was written
> without thinking much about what is an error and what functions can
> fail.
>
> For the above function, the ErrorOr style is the best we can do since
> to get the address of a symbol on a ELF .o file we have to find the
> section and that index can be wrong (not that we actually report that
> at the moment :-( ).
>
> But for alignment we can use just use "uint32_t
> getSymbolAlignment(DataRefImpl Symb) const" (see r238700).
>
> Going further, it might even be better to replace getSymbolAddress
> with a getSymbolValue that cannot fail and have the caller handle the
> fact that the value is sometimes a virtual address and sometimes
> section relative (it is likely more efficient too).
>
>
> > On IRC discussion Lang suggested
> > 3. Check the whole object file contents in constructor or validate()
> method,
> > and get rid
> > of all error codes in regular accessors.
> >
> > Pros:
> >   * the interface is much cleaner
> >   * no extra overhead for trusted (e.g. JIT) object files.
> >
> > Cons:
> >   * significant change, fundamentally shifting the way object files are
> > handled
> >   * verifier function should now absolutely everything about the object
> > file, and anticipate all possible use cases. Especially hard, assuming
> that
> > ObjectFile interface allows user to pass any garbage as input arguments
> > (e.g. as DataRefImpl in the example above).
> >   * verifier can be slow, and might be an overkill if we strongly desire
> to
> > parse some bits of object file lazily.
>
>
> Please don't do this. At the library level we cannot know what parts
> of the file a client would actually use.


>
>
> > ================
> >
> > Instead of http://reviews.llvm.org/D10111, I'm going to proceed with
> minimal
> > incremental changes, that would allow fuzzer to move forward. Namely, I
> want
> > to keep the changes to headers as small as possible, changing functions
> one
> > by one, and preferring to use ErrorOr<> construct (option 2 listed
> above).
> > An additional benefit of this is that each small incremental change
> would be
> > accompanied by the test case generated by fuzzer, that exposed this
> problem.
> >
> > Let me know if you think it's a good or terrible idea.
>
> IMHO the most important thing is to make sure that every error path
> you add is tested. With tests we can refactor. For example, see the
> refactoring in (r238024).
>

This revision uses report_fatal_error() instead of returning the error
code. The only benefit of this
is providing a reasonable error message, but this limits the ability to use
LLVMObject as a library
for parsing/querying invalid object files.

Can we sacrifice readable error messages if we doesn't want the library to
crash? Or we should
add extra object_error enum for every type of error?


>
> Cheers,
> Rafael
>



-- 
Alexey Samsonov
vonosmas at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150601/ff9c5ec5/attachment.html>


More information about the llvm-dev mailing list