[PATCH] D152159: [AppleAccelTable][NFC] Refactor iterator class

Mon Jun 5 14:57:12 PDT 2023

dblaikie added inline comments.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp:279
+        FormValue.extractValue(Table.AccelSection, Offset, Table.FormParams);
+    assert(ExtractWorked && "Failed to extract value from hash data entry");
+  }
----------------
fdeazeve wrote:
> dblaikie wrote:
> > fdeazeve wrote:
> > > aprantl wrote:
> > > > What does it mean for `ExtractWorked` to be false?
> > > > If this can be triggered by malformed input, even an asserts-enabled parser needs to handle it gracefully. Asserts should only be used for internal consistency checks that may never fail.
> > > >If this can be triggered by malformed input, even an asserts-enabled parser needs to handle it gracefully. 
> > > 
> > > That's what I mean in the commit message with:
> > > 
> > > > We maintain the already existing assumption that failures never happen, but now
> > > > make it explicit with an assert.
> > > 
> > > Today, if we fail to extract, we're going to start producing completely incorrect data and be silent about it. Imagine one of the iteration fails: the offset doesn't get updated and the next iteration is going to re-use the offset of the previous iteration.
> > > 
> > > IMO we're not handling it gracefully today, that's why I added the assert. But I don't feel strongly about this (just a bit :) )
> > > 
> > > I think the only way to fix this correctly is to remove the iterator abstraction and use callbacks instead. Non-end iterators should never fail the dereference operator, but data extracts are inherently a "fail-able" abstraction.
> > I believe @lhames and I spoke about fallible iterators previously & he's used some iterator abstractions with fallibility (which can be nicer than a callback - for instance, allowing early exit or other novel iteration sequences that a callback API might not provide in a general way) - by having the iterator terminate early (when it hits an error, it becomes == end) and having a member you have to inspect after the fact to query for the error state/if it's failed.
> >  and having a member you have to inspect after the fact to query for the error state/if it's failed.
> 
> That's an interesting idea, but how would that work with range-based for loops, as they don't expose the iterator itself to the programmer?
> 
> Are you aware of examples of such iterators inside LLVM?
> 
> > having the iterator terminate early 
> 
> Note how this also forces the iterator to know how to set itself to the end iterator, which is ok but requires more bookkeeping.
> 
> 
> One other possibility we could try is to have the iterator's `operator *` return an optional. However, the current implementation of these iterators (both before and after this patch) avoid memory operations by keeping an `Entry` inside the iterator; the optional would inevitably have copy that. Unless we use pointers as optionals? (i.e. `operator*` returns an `Entry*`) 
> 
> While what I'm saying applies only for apple_names, the standardized `debug_names` implementation seems to follow a similar pattern, though the code there is even more complex and I haven't yet been able to digest it.
> 
> Another thing we could do is to eagerly read all the data (except maybe for the strings) from the section and build a proper data structure with the FormValues and hashes and offsets already parsed. So we never have a chance to error again after the constructor is finished. That said, I suspect the original implementors didn't do this for performance / memory reasons? It's interesting to consider how some queries from LLDB force us to search/materialize all of the table anyway. For example, by search by regex (gotta read all the strings) or by DIE whose offset is inside a given range (gotta read all the HashData to find the DIE offset)
Oh, actually - maybe we settled on a different idiom - an out parameter from the range accessor, that's compatible with range-for loops, it's even documented: https://llvm.org/docs/ProgrammersManual.html#building-fallible-iterators-and-iterator-ranges

& yeah, certainly one of Lang's goals with libObject was to make it as lazy as possible so as not to do any extra work/hurt performance but also to be as usable as possible on potentially invalid object files - be able to parse all the parts that can be parsed, without tripping over on the first failure and being unable to continue.

The LLVM libDebugInfoDWARF code would probably benefit from similar goals - same idea that it might be used to investigate invalid inputs, and being able to keep going as much as possible/not trip over unrelated errors would be a valuable feature.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152159/new/

https://reviews.llvm.org/D152159