[LLVMdev] Error handling in LLVMObject library

Fri May 29 18:57:28 PDT 2015

On Friday, May 29, 2015, Alexey Samsonov <vonosmas at gmail.com> wrote:

> Hi everyone,
>
> Having proper error handling in LLVM's Object parsing library is a nice
> thing by itself, but it would additionally allow us to find bugs by fuzzing
> (see r238451 that adds llvm-dwarfdump-fuzzer tool), for which the clean
> input validation is essential.
>
> This is a generic discussion of state of affairs. I want to do some
> progress in fuzzing before we finish it (especially if we decide to make a
> significant intrusive changes), you may scroll down for my plan.
>
> The code in lib/Object calls report_fatal_error() far too often, both when
> we're (a) just constructing the specific implementation of ObjectFile, and
> (b) when we access its contents and find out the file is broken and can't
> be parsed properly.
>
> We should just go and fix (a): ObjectFile factory methods return
> ErrorOr<std::unique_ptr<ObjectFile>>, and we should propagate the error
> appropriately.
>
> (b) is harder. The current approach is to use std::error_code as a return
> type, and store the result in by-reference argument, for instance:
>   std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);
>
> I wanted to follow this approach in a proposed large MachO API change
> (http://reviews.llvm.org/D10111
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D10111&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=t8fOzjcE-HffAHdlkx8x2RzDYXmOrfwTVRZMBY_I1-k&s=vIEUEH66ZENYUbm5_hrLkGBXJih9kgG91w9SNPrpGGU&e=>),
> but it raised discussion on whether this approach is right.
> Moving this discussion here. I see the following options:
>
> 1. Use the current approach:
>   std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);
>
> Pros:
>   * don't need to change a large number of (often virtual) API functions
>   * has a nice error handling pattern used in LLVM tools:
>   uint64_t Addr;
>   if (error(getSymbolAddress(Symbol, Addr)))
>     return;  // or continue, or do anything else.
>
> Cons:
>   * return value can just be silently ignored. Adding warn_unused_result
> attribute on per-function basis is ugly
>   * adds extra overhead for cases when we're certain the result would be
> valid.
>
> 2. Switch to ErrorOr wrapper:
>   ErrorOr<uint64_t> getSymbolAddress(DataRefImpl Symbol);
>
> Pros:
>   * handling the error is now mandatory and explicit.
>   * callers can explicitly skip error handling if they know the result
> would be valid:
>     uint64_t Addr = getSymbolAddress(Symbol).get();
>   and it would fail the assert if they are wrong.
>
> Cons:
>   * need to change lots of old code, or live with two versions of functions
>   * error handling boilerplate in regular code on call site is ugly:
>   auto AddrOrErr = getSymbolAddress(Symbol);
>   if (AddrOrErr.hasError())
>     return;  // or continue, or whatever
>   uint64_t Addr = AddrOrErr.get();
>   (can probably be improved with a macro)
>   * adds extra overhead for cases when we're certain the result would be
> valid.
>

The "error handling boilerplate" here isn't actually all that bad, and
ErrorOr can generally just be used like an API that returns a pointer and
null on err, but with better errors, ie:

  auto Addr = getSymbolAddress(...)
  if (auto EC = Addr.getError()) {
    // handle the error
  }
  // use *Addr via dereference

This is the same number of lines of code a (1), but harder to misuse. There
are situations where it's not quite this nice, like ErrorOr<unique_ptr>,
but that's worth solving in its own right if this is an issue.

I haven't looked too much at the code in question, but I'm pretty confident
that this is the right way to go about it.

On IRC discussion Lang suggested
> 3. Check the whole object file contents in constructor or validate()
> method, and get rid
> of all error codes in regular accessors.
>
> Pros:
>   * the interface is much cleaner
>   * no extra overhead for trusted (e.g. JIT) object files.
>
> Cons:
>   * significant change, fundamentally shifting the way object files are
> handled
>   * verifier function should now absolutely everything about the object
> file, and anticipate all possible use cases. Especially hard, assuming that
> ObjectFile interface allows user to pass any garbage as input arguments
> (e.g. as DataRefImpl in the example above).
>   * verifier can be slow, and might be an overkill if we strongly desire
> to parse some bits of object file lazily.
>
> ================
>
> Instead of http://reviews.llvm.org/D10111
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D10111&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=t8fOzjcE-HffAHdlkx8x2RzDYXmOrfwTVRZMBY_I1-k&s=vIEUEH66ZENYUbm5_hrLkGBXJih9kgG91w9SNPrpGGU&e=>,
> I'm going to proceed with minimal incremental changes, that would allow
> fuzzer to move forward. Namely, I want to keep the changes to headers as
> small as possible, changing functions one by one, and preferring to use
> ErrorOr<> construct (option 2 listed above). An additional benefit of this
> is that each small incremental change would be accompanied by the test case
> generated by fuzzer, that exposed this problem.
>
> Let me know if you think it's a good or terrible idea.
>
> --
> Alexey Samsonov
> vonosmas at gmail.com <javascript:_e(%7B%7D,'cvml','vonosmas at gmail.com');>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150529/30aa8588/attachment.html>