<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jun 1, 2015 at 10:44 AM, Alexey Samsonov <span dir="ltr"><<a href="mailto:vonosmas@gmail.com" target="_blank">vonosmas@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">OK, looks like there's a strong consensus that ErrorOr<> is the way to go. I'd probably<div>abstain from doing this in one huge change, as it would be too large to digest and review,</div><div>and not totally mechanical. I will probably proceed with smaller changes. accompanied with</div><div>test cases generated by fuzzer.<br><div class="gmail_extra"><br><div class="gmail_quote"><div><div class="h5">On Sun, May 31, 2015 at 5:01 PM, Rafael Espíndola <span dir="ltr"><<a href="mailto:rafael.espindola@gmail.com" target="_blank">rafael.espindola@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div>On 29 May 2015 at 19:06, Alexey Samsonov <<a href="mailto:vonosmas@gmail.com" target="_blank">vonosmas@gmail.com</a>> wrote:<br>

> Hi everyone,<br>

><br>

> Having proper error handling in LLVM's Object parsing library is a nice<br>

> thing by itself, but it would additionally allow us to find bugs by fuzzing<br>

> (see r238451 that adds llvm-dwarfdump-fuzzer tool), for which the clean<br>

> input validation is essential.<br>

><br>

> This is a generic discussion of state of affairs. I want to do some progress<br>

> in fuzzing before we finish it (especially if we decide to make a<br>

> significant intrusive changes), you may scroll down for my plan.<br>

><br>

> The code in lib/Object calls report_fatal_error() far too often, both when<br>

> we're (a) just constructing the specific implementation of ObjectFile, and<br>

> (b) when we access its contents and find out the file is broken and can't be<br>

> parsed properly.<br>

><br>

> We should just go and fix (a): ObjectFile factory methods return<br>

> ErrorOr<std::unique_ptr<ObjectFile>>, and we should propagate the error<br>

> appropriately.<br>

><br>

> (b) is harder. The current approach is to use std::error_code as a return<br>

> type, and store the result in by-reference argument, for instance:<br>

>   std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);<br>

><br>

> I wanted to follow this approach in a proposed large MachO API change<br>

> (<a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D10111&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=9TFs7K2z3Dmjxmi6GM-lHwqz1_akAqqQUpjTdMZc92Y&s=miywZwpUCUl158BoEY9imL43j8BoJH58M3A5vvNtI9c&e=" target="_blank">http://reviews.llvm.org/D10111</a>), but it raised discussion on whether this<br>

> approach is right.<br>

> Moving this discussion here. I see the following options:<br>

><br>

> 1. Use the current approach:<br>

>   std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);<br>

><br>

> Pros:<br>

>   * don't need to change a large number of (often virtual) API functions<br>

>   * has a nice error handling pattern used in LLVM tools:<br>

>   uint64_t Addr;<br>

>   if (error(getSymbolAddress(Symbol, Addr)))<br>

>     return;  // or continue, or do anything else.<br>

><br>

> Cons:<br>

>   * return value can just be silently ignored. Adding warn_unused_result<br>

> attribute on per-function basis is ugly<br>

>   * adds extra overhead for cases when we're certain the result would be<br>

> valid.<br>

<br>

</div></div>I agree, this is really bad.<br>

<span><br>

> 2. Switch to ErrorOr wrapper:<br>

>   ErrorOr<uint64_t> getSymbolAddress(DataRefImpl Symbol);<br>

><br>

> Pros:<br>

>   * handling the error is now mandatory and explicit.<br>

>   * callers can explicitly skip error handling if they know the result would<br>

> be valid:<br>

>     uint64_t Addr = getSymbolAddress(Symbol).get();<br>

>   and it would fail the assert if they are wrong.<br>

><br>

> Cons:<br>

>   * need to change lots of old code, or live with two versions of functions<br>

>   * error handling boilerplate in regular code on call site is ugly:<br>

>   auto AddrOrErr = getSymbolAddress(Symbol);<br>

>   if (AddrOrErr.hasError())<br>

>     return;  // or continue, or whatever<br>

>   uint64_t Addr = AddrOrErr.get();<br>

>   (can probably be improved with a macro)<br>

>   * adds extra overhead for cases when we're certain the result would be<br>

> valid.<br>

<br>

</span>I think this is a strict improvement, and would not object to doing it<br>

in one big step if you wanted.<br>

<br>

One further improvement is that it looks like the API was written<br>

without thinking much about what is an error and what functions can<br>

fail.<br>

<br>

For the above function, the ErrorOr style is the best we can do since<br>

to get the address of a symbol on a ELF .o file we have to find the<br>

section and that index can be wrong (not that we actually report that<br>

at the moment :-( ).<br>

<br>

But for alignment we can use just use "uint32_t<br>

getSymbolAlignment(DataRefImpl Symb) const" (see r238700).<br>

<br>

Going further, it might even be better to replace getSymbolAddress<br>

with a getSymbolValue that cannot fail and have the caller handle the<br>

fact that the value is sometimes a virtual address and sometimes<br>

section relative (it is likely more efficient too).<br>

<span><br>

<br>

> On IRC discussion Lang suggested<br>

> 3. Check the whole object file contents in constructor or validate() method,<br>

> and get rid<br>

> of all error codes in regular accessors.<br>

><br>

> Pros:<br>

>   * the interface is much cleaner<br>

>   * no extra overhead for trusted (e.g. JIT) object files.<br>

><br>

> Cons:<br>

>   * significant change, fundamentally shifting the way object files are<br>

> handled<br>

>   * verifier function should now absolutely everything about the object<br>

> file, and anticipate all possible use cases. Especially hard, assuming that<br>

> ObjectFile interface allows user to pass any garbage as input arguments<br>

> (e.g. as DataRefImpl in the example above).<br>

>   * verifier can be slow, and might be an overkill if we strongly desire to<br>

> parse some bits of object file lazily.<br>

<br>

<br>

</span>Please don't do this. At the library level we cannot know what parts<br>

of the file a client would actually use.</blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<span><br>

<br>

<br>

> ================<br>

><br>

> Instead of <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D10111&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=9TFs7K2z3Dmjxmi6GM-lHwqz1_akAqqQUpjTdMZc92Y&s=miywZwpUCUl158BoEY9imL43j8BoJH58M3A5vvNtI9c&e=" target="_blank">http://reviews.llvm.org/D10111</a>, I'm going to proceed with minimal<br>

> incremental changes, that would allow fuzzer to move forward. Namely, I want<br>

> to keep the changes to headers as small as possible, changing functions one<br>

> by one, and preferring to use ErrorOr<> construct (option 2 listed above).<br>

> An additional benefit of this is that each small incremental change would be<br>

> accompanied by the test case generated by fuzzer, that exposed this problem.<br>

><br>

> Let me know if you think it's a good or terrible idea.<br>

<br>

</span>IMHO the most important thing is to make sure that every error path<br>

you add is tested. With tests we can refactor. For example, see the<br>

refactoring in (r238024).<br></blockquote><div><br></div></div></div><div>This revision uses report_fatal_error() instead of returning the error code. The only benefit of this</div><div>is providing a reasonable error message, but this limits the ability to use LLVMObject as a library</div><div>for parsing/querying invalid object files.</div><div><br></div><div>Can we sacrifice readable error messages if we doesn't want the library to crash? Or we should</div><div>add extra object_error enum for every type of error?</div></div></div></div></div></blockquote><div><br>It seems reasonable to add an enumerator to the error category enumeration for any specific error path that's useful as a specific diagnostic. At least cases where we have observably different diagnostics already with test cases could be maintained - for other things we could be vague until we have a particular need to be less so... <br><br>(but I don't work with this stuff, so owners/users may have more informed opinions on the matter than I do)<br><br>- David<br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div class="gmail_extra"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

Cheers,<br>

Rafael<span class="HOEnZb"><font color="#888888"><br>

</font></span></blockquote></div><span class="HOEnZb"><font color="#888888"><br><br clear="all"><div><br></div>-- <br><div><div dir="ltr">Alexey Samsonov<br><a href="mailto:vonosmas@gmail.com" target="_blank">vonosmas@gmail.com</a></div></div>

</font></span></div></div></div>

</blockquote></div><br></div></div>