<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">I don't much about this area in the code so I can't say much about the current state of things. But in my experience of writing parsers there is a variant 4 that works nicely:</div><div class=""><br class=""></div><div class="">4: Record error and return dummy value:</div><div class="">uint64_t getSymbolAddress(...) {</div><div class=""> if (somethingfailed) {</div><div class=""> errors.push_back(ERRORCODE); // maybe also a message</div><div class=""> return ~0; // just return an invalid/error value.</div><div class=""> }</div><div class="">}</div><div class=""><br class=""></div><div class="">Now you don't have to check for the return value of getSymbolAddress() everywhere but just at the place where you need a legal SymbolAddress value which is typically far less places as most of the data is just stored away in class fields.</div><div class=""><br class=""></div><div class="">All you have to do know is add an</div><div class="">if (!error.empty()) {</div><div class=""> ...;</div><div class="">}</div><div class="">at the highest level parsing function...</div><div class=""><br class=""></div><div class="">- Matthias</div><br class=""><div><blockquote type="cite" class=""><div class="">On May 29, 2015, at 4:06 PM, Alexey Samsonov <<a href="mailto:vonosmas@gmail.com" class="">vonosmas@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="">Hi everyone,<div class=""><br class=""></div><div class="">Having proper error handling in LLVM's Object parsing library is a nice thing by itself, but it would additionally allow us to find bugs by fuzzing (see r238451 that adds llvm-dwarfdump-fuzzer tool), for which the clean input validation is essential.</div><div class=""><br class=""></div><div class="">This is a generic discussion of state of affairs. I want to do some progress in fuzzing before we finish it (especially if we decide to make a significant intrusive changes), you may scroll down for my plan.</div><div class=""><br class=""></div><div class="">The code in lib/Object calls report_fatal_error() far too often, both when we're (a) just constructing the specific implementation of ObjectFile, and (b) when we access its contents and find out the file is broken and can't be parsed properly.</div><div class=""><br class=""></div><div class="">We should just go and fix (a): ObjectFile factory methods return ErrorOr<std::unique_ptr<ObjectFile>>, and we should propagate the error appropriately.</div><div class=""><br class=""></div><div class="">(b) is harder. The current approach is to use std::error_code as a return type, and store the result in by-reference argument, for instance:</div><div class=""> std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);</div><div class=""><br class=""></div><div class="">I wanted to follow this approach in a proposed large MachO API change</div><div class="">(<a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D10111&d=AwMFAg&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=yEhBej5sXLFdjanWE0enE5lvyfZZODAf5yFEo5njUDM&s=hQhMgPbQr0stXk6KfGtSTJ_uHZi6SZPQBKimDc0Kyt8&e=" class="">http://reviews.llvm.org/D10111</a>), but it raised discussion on whether this approach is right.</div><div class="">Moving this discussion here. I see the following options:</div><div class=""><br class=""></div><div class="">1. Use the current approach:</div><div class=""> std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);</div><div class=""><br class=""></div><div class="">Pros:</div><div class=""> * don't need to change a large number of (often virtual) API functions</div><div class=""> * has a nice error handling pattern used in LLVM tools:</div><div class=""> uint64_t Addr;</div><div class=""> if (error(getSymbolAddress(Symbol, Addr)))</div><div class=""> return; // or continue, or do anything else.</div><div class=""><br class=""></div><div class="">Cons:</div><div class=""> * return value can just be silently ignored. Adding warn_unused_result attribute on per-function basis is ugly</div><div class=""> * adds extra overhead for cases when we're certain the result would be valid.</div><div class=""><br class=""></div><div class="">2. Switch to ErrorOr wrapper:</div><div class=""> ErrorOr<uint64_t> getSymbolAddress(DataRefImpl Symbol);</div><div class=""><br class=""></div><div class="">Pros:</div><div class=""> * handling the error is now mandatory and explicit.</div><div class=""> * callers can explicitly skip error handling if they know the result would be valid:</div><div class=""> uint64_t Addr = getSymbolAddress(Symbol).get();</div><div class=""> and it would fail the assert if they are wrong.</div><div class=""><br class=""></div><div class="">Cons:</div><div class=""> * need to change lots of old code, or live with two versions of functions</div><div class=""> * error handling boilerplate in regular code on call site is ugly:</div><div class=""> auto AddrOrErr = getSymbolAddress(Symbol);</div><div class=""> if (AddrOrErr.hasError())</div><div class=""> return; // or continue, or whatever</div><div class=""> uint64_t Addr = AddrOrErr.get();</div><div class=""> (can probably be improved with a macro)</div><div class=""> * adds extra overhead for cases when we're certain the result would be valid.</div><div class=""><br class=""></div><div class="">On IRC discussion Lang suggested</div><div class="">3. Check the whole object file contents in constructor or validate() method, and get rid</div><div class="">of all error codes in regular accessors.</div><div class=""><br class=""></div><div class="">Pros:</div><div class=""> * the interface is much cleaner</div><div class=""> * no extra overhead for trusted (e.g. JIT) object files.</div><div class=""><br class=""></div><div class="">Cons:</div><div class=""> * significant change, fundamentally shifting the way object files are handled</div><div class=""> * verifier function should now absolutely everything about the object file, and anticipate all possible use cases. Especially hard, assuming that ObjectFile interface allows user to pass any garbage as input arguments (e.g. as DataRefImpl in the example above).</div><div class=""> * verifier can be slow, and might be an overkill if we strongly desire to parse some bits of object file lazily.</div><div class=""><br class=""></div><div class="">================</div><div class=""><br class=""></div><div class="">Instead of <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D10111&d=AwMFAg&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=yEhBej5sXLFdjanWE0enE5lvyfZZODAf5yFEo5njUDM&s=hQhMgPbQr0stXk6KfGtSTJ_uHZi6SZPQBKimDc0Kyt8&e=" class="">http://reviews.llvm.org/D10111</a>, I'm going to proceed with minimal incremental changes, that would allow fuzzer to move forward. Namely, I want to keep the changes to headers as small as possible, changing functions one by one, and preferring to use ErrorOr<> construct (option 2 listed above). An additional benefit of this is that each small incremental change would be accompanied by the test case generated by fuzzer, that exposed this problem.</div><div class=""><br class=""></div><div class="">Let me know if you think it's a good or terrible idea.</div><div class=""><br class=""></div><div class="">-- <br class=""></div><div class=""><div class=""><div class=""><div class="gmail_signature"><div dir="ltr" class="">Alexey Samsonov<br class=""><a href="mailto:vonosmas@gmail.com" target="_blank" class="">vonosmas@gmail.com</a></div></div>
</div></div></div></div>
</div></blockquote></div><br class=""></body></html>