[PATCH] [PR20893] Change SelectionDAG's DbgValueMap to have set semantics (NFC)

Thu Sep 25 10:27:26 PDT 2014

On Mon, Sep 22, 2014 at 4:46 PM, Adrian Prantl <aprantl at apple.com> wrote:

>
> > On Sep 22, 2014, at 4:33 PM, David Blaikie <dblaikie at gmail.com> wrote:
> >
> >
> >
> > On Mon, Sep 22, 2014 at 4:27 PM, Adrian Prantl <aprantl at apple.com>
> wrote:
> >
> > > On Sep 22, 2014, at 3:28 PM, David Blaikie <dblaikie at gmail.com> wrote:
> > >
> > >
> > >
> > > On Mon, Sep 22, 2014 at 3:14 PM, Adrian Prantl <aprantl at apple.com>
> wrote:
> > >
> > > > On Sep 22, 2014, at 2:18 PM, David Blaikie <dblaikie at gmail.com>
> wrote:
> > > >
> > > > as for testing - assuming these duplicates get deduplicated
> elsewhere (could you explain where?)
> > > There are two kinds of duplicates that appear in this case:
> > > 1) identical SDDbgNode* (same pointer value)
> > > 2) an equivalent SDDbgNode (distinct pointers, but otherwise a copy)
> > > The first kind is automatically deduplicated because the Invalid flag
> is set after processing it the first time.
> > > The second one will result in two duplicate DBG_VALUE intrinsics which
> will get coalesced by the backend.
> > >
> > > OK - perhaps you could explain how these two equivalent but distinct
> SDDbgNodes come into existence?
> > >
> > >
> > > > then I doubt there's much to test - this is then just a perf
> optimization, not a change in any semantics.
> > > I was mostly posting this here seeking absolution for not committing
> this with a testcase :-)
> > > > Maybe dumping MI could be testable, but I'm not sure what the
> dumping options (& whether they're available in non-asserts builds, etc)
> are like.
> > > There is -print-after-all and the likes, but the real problem is that
> the testcase is too large to serve as a useful regression test.
> > >
> > > You can't cause two of these equivalent but distinct objects to be
> created with less than several thousand lines?
> > >
> > > That is a bit surprising/strange.
> >
> > You’re probably aware that constructing test cases from scratch is
> pretty hard to do manually when debug info is involved. Employing bugpoint,
> delta and a python script that eliminates unused MDNodes, I was able to
> reduce 100k+ lines of C++
> >
> > Right - usually I reduce the input source as far as possible. creduce (
> http://embed.cs.utah.edu/creduce/ ) usually gets me down to a couple of
> hundred lines. Though I realize something like this might not reduce so
> easily (one way to workaround this is sometimes to add more robust
> assertions (eg: asserting that you don't add the same element twice would
> give you a reproduction of a case where that second element is added),
> causing the failure to be more reliable/happen earlier - thus the reduction
> is faster and narrower).
> >
> > Were you reducing the OOM directly? (ie: the test you gave delta
> involved "check that the program still OOMs"?) Yeah, that's no fun. I've
> reduced an unreliable repro a few weeks back which took a couple of days to
> get down to something more manageable.
>
> No, what I did was this: First I ran bugpoint on an LLVM that had an
> assertion firing when a double insertion happened.

Ah - I generally use delta (or more accurately/effectively, CReduce) on the
original source code rather than bugpoint - given the gnarly issues
involved with mutating IR leading to debug info metadata invariant
violations, etc.

> After bugpoint was done I ran delta on the resulting IR, with two versions
> LLVM, one with the assertion, and one with my patch applied. I would have
> delta only accept revisions that were the assertion fired, and the patch
> made the program compile fine (this is essential when reducing metadata,
> otherwise it’s too easy to come up with an input that will trigger the
> error, but won’t compile once the bug is fixed).
>

And after all that you still had a several thousand line reproduction? :/

Is there any way you could provide a source code sample and I can see if I
can creduce down to something more manageable?

> >
> > If it's just about the OOM, and the OOM is because of these duplicates -
> I'd consider putting in an assertion that would fire when the duplicates
> appear, and just reducing that instead - it should be much faster, more
> reliable, and smaller.
> That’s precisely what I did (in my first step). In the second step I ran
> delta on the IR (as outlined above). In the third step I ran a python
> script that would compute the transitive closure of all MDNodes referenced
> by the IR file (sans MDnodes) and strip all metadata nodes that were no
> longer needed.
> > There's no need to create or commit a test case that used to OOM - we
> don't need anything that big in the regression suite and the same codepath
> can be exercised by a much smaller example that just shows even a single
> case of the duplication that's of interest.
>
> Sure! That’s what I hoped the outcome of this review would be :-)
>

While I'm totally with you that committing an example that used to actually
OOM the compiler isn't helpful, adding an assertion to help reduce down a
test case that exercises the set-like behavior seems relevant and should be
far smaller/more reasonable to commit. I'd like to have something in-tree
that has the potential to highlight the difference in behavior (even if
it's just a difference in performance) between a set and a vector.

> >
> > I'm still not clear on what this duplication represents/why it happens -
> could you explain it?
>
> As far as I understand it so far, SelectionDAG is combining two SDNodes
> and then merging their DbgValues via DAG.TransferDbgValues(). Both SDNodes
> to be combined have an SDDbgValue with identical contents, both get
> appended to the resulting SDNode’s DbgValueMap entry.
>

Hopefully this'll be easier for me to look at/understand with a minimized
example.

>
> >
> > to 2375 lines of IR (2139 of which are metadata!), but I failed to go
> beyond that (see also the referenced PR). The function in the reduced
> testcase contains a single dbg.value intrinsic and a really long chain of
> GEP instructions.
> >
> > -- adrian
> >
> > >
> > >
> > > thanks,
> > > adrian
> > > >
> > > > ================
> > > > Comment at: lib/CodeGen/SelectionDAG/SelectionDAG.cpp:6180
> > > > @@ +6179,3 @@
> > > > +      if (*Val == *V) return;
> > > > +    Vals.push_back(V);
> > > > +  }
> > > > ----------------
> > > > Could 'Vals' be an actual set of SDDbgValue*s?
> > > >
> > > > Are SDDbgValue*s inherently uniqued such that pointer identity would
> be sufficient? (or do you need that equality comparison added above? Which
> could hopefully be adjusted to < comparison (or hashing) for a set-like
> structure)
>

Question here still /\, (as an aside, if you need a convenient
implementation of < you can use std::tie(...) < std::tie(...) to use
tuple's tie to implement tie of some arbitrary things - though it'll still
require some tricks due to the union/variant aspect of this structure)

> > > >
> > > > http://reviews.llvm.org/D5451
> > > >
> > > >
> > >
> > >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140925/aa865057/attachment.html>