<html><head><meta http-equiv="Content-Type" content="text/html charset=iso-8859-1"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>Hi Richard,</div><div><br></div><div>Thanks for making great progress on this! I agree with Richard Smith's suggestion about not using an std::map, but is there a reason to not just use a DenseMap to see if there are any enum collisions? Fundamentally your algorithm is O(n*log n), not O(n). Since you are converted everything to an int64_t, could you not just use a DenseMap<int64_t, EnumConstantDecl*>? You could then iterate over the EnumConstantDecls in one linear pass, converting them to int64_t's on the fly for comparison purposes. A few advantages I see are:</div><div><br></div><div>(1) No sorting, which is O(n log n)</div><div><br></div><div>(2) Diagnostics are based on declaration order without having to result to a stable_sort(), which is even more costly. If you want to report all duplicates as notes, the mapped value could easily be made into a PointerIntPair, where the "int" corresponds to an index into another vector where you collect the duplicates. Since the number of cases where duplicates will exist is few, you're optimizing for the common case.</div><div><br></div><div>With the current approach, I'm not certain, but something about the following looks curious:</div><div><br></div><div><blockquote type="cite"><div>+ // Find the first duplicate enum.</div><div>+ llvm::SmallVector<EnumConstantDecl*, 10>::iterator Iter = I + 1;</div><div>+ for (; Iter != E && llvm::APSInt::isSameValue(Value, (*Iter)->getInitVal());</div><div>+ ++Iter) {</div><div>+ if (ValidDuplicateEnum(*Iter, Enum))</div><div>+ break;</div><div>+ }</div><div>+</div><div><br></div></blockquote></div><div><br></div><div>If the array is sorted by constant value, I don't believe there is any need to keep marching through the rest of the items if the next item isn't a match. I may be mistaken, but I think this leaves your implementation still being quadratic in the worst case. Further, the following:</div><div><br></div><div><blockquote type="cite"><div>+</div><div>+ if (Iter == E)</div><div>+ break;</div><div>+</div></blockquote><br></div><div>appears to break out of the entire comparison entirely. Isn't it possible for Iter == E if no match was found in the preceding loop? For example, from your test cases, I'd expect to see a warning here:</div><div><br></div><div><blockquote type="cite"><div>+</div><div>+enum {</div><div>+ I1 = -1,</div><div>+ I2,</div><div>+ I3,</div><div>+ I4 = -2,</div><div>+ I5,</div><div>+ I6</div><div>+};</div></blockquote><br></div><div>Is there a reason why I1 and I5 are not reported as duplicates? (or I3 and I6)?</div><div><br></div><div>At a high level, I honestly find this logic to be more complicated than I would have expected. The sorting seems unnecessary, and will report diagnostics in an unnatural order (first based on enum constant value, then on declaration order). A straight linear pass seems more naturally to me, and DenseMap is very efficient. If you are worried about the malloc() traffic, you could always keep a DenseMap around in Sema just for this warning, and clear it out whenever entering CheckForDuplicateEnumValues(). You then pay the malloc() cost when you need a bigger map, which will quickly amortize the cost.</div><div><br></div><div>I honestly could be missing something here, so I apologize if I just am not reading the code correctly.</div><div><br></div><div>Cheers,</div><div>Ted</div><br><div><div>On Aug 13, 2012, at 7:05 PM, Richard Trieu <<a href="mailto:rtrieu@google.com">rtrieu@google.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">Several changes to this patch. Richard Smith's suggestions has improved performance. Those changes are, only check values 64-bits or smaller and directly compare the values instead of going through the functions and using a sorted vector instead of a map to store the values. Other changes were to exclude a few more cases, and move the test from SemaCXX to Sema and then have the RUN line test in both C and C++.<div>
<br></div><div>The improved performance has dropped the difference down to under 4% difference for files purely made of a single enum from a previous ~10%. For preprocessed Clang files, the difference was under the noise on my system.<br>
<br><div class="gmail_quote">On Fri, Jul 27, 2012 at 3:51 PM, Richard Trieu <span dir="ltr"><<a href="mailto:rtrieu@google.com" target="_blank">rtrieu@google.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>Running Clang over itself, using preprocessed files and -fsyntax-only</div>28.86 to 29.08 seconds - Control<div>28.89 to 29.49 seconds - Clang with -Wduplicate-enum<br>
<br>This was run without Richard Smith's suggestions. I will test those out to see if there's any impact from them.<div><br>
<br><div class="gmail_quote">On Wed, Jul 25, 2012 at 2:37 PM, Ted Kremenek <span dir="ltr"><<a href="mailto:kremenek@apple.com" target="_blank">kremenek@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word">Ok. That's still a scary number. Do you have numbers for realistic examples? For example, we know Clang has some particularly large enums. This micro benchmark is useful, but it may be overly pessimistic.<div>
<div><br><div><div>On Jul 25, 2012, at 2:34 PM, Richard Trieu <<a href="mailto:rtrieu@google.com" target="_blank">rtrieu@google.com</a>> wrote:</div><br><blockquote type="cite">Yes, that is the slowdown for the entire -fsyntax-only time for a source file with only an enum in it.<br>
<br><div class="gmail_quote">On Wed, Jul 25, 2012 at 1:48 PM, Ted Kremenek <span dir="ltr"><<a href="mailto:kremenek@apple.com" target="_blank">kremenek@apple.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word">Hi Richard,<div><br></div><div>If I am reading that right, the 6-10% slowdown is for the entire -fsyntax-only time? If so, that's definitely cost prohibitive.</div>
<div><br></div><div>Ted</div><div><br><div><div><div>On Jul 19, 2012, at 8:25 PM, Richard Trieu <<a href="mailto:rtrieu@google.com" target="_blank">rtrieu@google.com</a>> wrote:</div><br></div>
<blockquote type="cite"><div><div class="gmail_quote">On Wed, Jul 18, 2012 at 9:14 PM, Ted Kremenek <span dir="ltr"><<a href="mailto:kremenek@apple.com" target="_blank">kremenek@apple.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word"><div><div><div>On Jul 18, 2012, at 6:34 PM, Richard Trieu <<a href="mailto:rtrieu@google.com" target="_blank">rtrieu@google.com</a>> wrote:</div><br><blockquote type="cite">
<span style="font-family:Helvetica;font-size:medium;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:-webkit-auto;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;display:inline!important;float:none">A set could work for detecting the values, but both EnumConstantDecls are needed for the diagnostic, not just the values. Possibly a map from APSInt->EnumConstantDecl* would work. But either way, I would be dealing with getting APSInts to play nice with each other.</span></blockquote>
</div><br></div><div>That seems reasonable to me. The primary performance issue I see is the quadratic algorithmic complexity. If the APSInt comparisons are an issue, we can see if we can find ways to optimize that further.</div>
</div></blockquote></div><br><div>I created two more variations on and measured some timings. Both used a map, one with a custom compare function and one that extended the APSInt value before insertion. The APSInt extension had the better time, so I'll be giving the number for that one.</div>
<div><br></div><div>At 10,000 elements, there was a 6-10% slow down. This amounts to .01-.03 seconds difference on .13-.27 second runtime.</div><div><br></div><div>At 100,000 elements, 8-12% slow down. .2-.3 seconds on 1.34 to 2.66 second run time.</div>
<div><br></div><div>At 1,000,000 elements, 7-14% slow down. Around 2 second difference for runs of 13.6 to 26.7 seconds.</div><div><br></div><div>A new patch has been attached which has the APSInt bit extension before adding to the map.</div>
</div><span><duplicate-enum-bit-extension.patch></span></blockquote></div><br></div></div>
</blockquote></div><br>
</blockquote></div><br></div></div></div>
</blockquote></div><br></div></div>
</blockquote></div><br></div>
<span><duplicate-enum-vector.patch></span></blockquote></div><br></body></html>