<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Sep 22, 2014 at 4:27 PM, Adrian Prantl <span dir="ltr"><<a href="mailto:aprantl@apple.com" target="_blank">aprantl@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span class=""><br>

> On Sep 22, 2014, at 3:28 PM, David Blaikie <<a href="mailto:dblaikie@gmail.com">dblaikie@gmail.com</a>> wrote:<br>

><br>

><br>

><br>

> On Mon, Sep 22, 2014 at 3:14 PM, Adrian Prantl <<a href="mailto:aprantl@apple.com">aprantl@apple.com</a>> wrote:<br>

><br>

> > On Sep 22, 2014, at 2:18 PM, David Blaikie <<a href="mailto:dblaikie@gmail.com">dblaikie@gmail.com</a>> wrote:<br>

> ><br>

> > as for testing - assuming these duplicates get deduplicated elsewhere (could you explain where?)<br>

> There are two kinds of duplicates that appear in this case:<br>

> 1) identical SDDbgNode* (same pointer value)<br>

> 2) an equivalent SDDbgNode (distinct pointers, but otherwise a copy)<br>

> The first kind is automatically deduplicated because the Invalid flag is set after processing it the first time.<br>

> The second one will result in two duplicate DBG_VALUE intrinsics which will get coalesced by the backend.<br>

><br>

> OK - perhaps you could explain how these two equivalent but distinct SDDbgNodes come into existence?<br>

><br>

><br>

> > then I doubt there's much to test - this is then just a perf optimization, not a change in any semantics.<br>

> I was mostly posting this here seeking absolution for not committing this with a testcase :-)<br>

> > Maybe dumping MI could be testable, but I'm not sure what the dumping options (& whether they're available in non-asserts builds, etc) are like.<br>

> There is -print-after-all and the likes, but the real problem is that the testcase is too large to serve as a useful regression test.<br>

><br>

> You can't cause two of these equivalent but distinct objects to be created with less than several thousand lines?<br>

><br>

> That is a bit surprising/strange.<br>

<br>

</span>You’re probably aware that constructing test cases from scratch is pretty hard to do manually when debug info is involved. Employing bugpoint, delta and a python script that eliminates unused MDNodes, I was able to reduce 100k+ lines of C++</blockquote><div><br></div><div>Right - usually I reduce the input source as far as possible. creduce ( <a href="http://embed.cs.utah.edu/creduce/">http://embed.cs.utah.edu/creduce/</a> ) usually gets me down to a couple of hundred lines. Though I realize something like this might not reduce so easily (one way to workaround this is sometimes to add more robust assertions (eg: asserting that you don't add the same element twice would give you a reproduction of a case where that second element is added), causing the failure to be more reliable/happen earlier - thus the reduction is faster and narrower).</div><div> <br>Were you reducing the OOM directly? (ie: the test you gave delta involved "check that the program still OOMs"?) Yeah, that's no fun. I've reduced an unreliable repro a few weeks back which took a couple of days to get down to something more manageable.<br><br>If it's just about the OOM, and the OOM is because of these duplicates - I'd consider putting in an assertion that would fire when the duplicates appear, and just reducing that instead - it should be much faster, more reliable, and smaller. There's no need to create or commit a test case that used to OOM - we don't need anything that big in the regression suite and the same codepath can be exercised by a much smaller example that just shows even a single case of the duplication that's of interest.<br><br>I'm still not clear on what this duplication represents/why it happens - could you explain it?<br><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> to 2375 lines of IR (2139 of which are metadata!), but I failed to go beyond that (see also the referenced PR). The function in the reduced testcase contains a single dbg.value intrinsic and a really long chain of GEP instructions.<br>

<span class=""><font color="#888888"><br>

-- adrian<br>

</font></span><div class=""><div class="h5"><br>

><br>

><br>

> thanks,<br>

> adrian<br>

> ><br>

> > ================<br>

> > Comment at: lib/CodeGen/SelectionDAG/SelectionDAG.cpp:6180<br>

> > @@ +6179,3 @@<br>

> > +      if (*Val == *V) return;<br>

> > +    Vals.push_back(V);<br>

> > +  }<br>

> > ----------------<br>

> > Could 'Vals' be an actual set of SDDbgValue*s?<br>

> ><br>

> > Are SDDbgValue*s inherently uniqued such that pointer identity would be sufficient? (or do you need that equality comparison added above? Which could hopefully be adjusted to < comparison (or hashing) for a set-like structure)<br>

> ><br>

> > <a href="http://reviews.llvm.org/D5451" target="_blank">http://reviews.llvm.org/D5451</a><br>

> ><br>

> ><br>

><br>

><br>

<br>

</div></div></blockquote></div><br></div></div>