Looks like I need to clarify a few points about my code. First, at most one diagnostic will be emitted for each enum constant. If there are five constants with the same value, one warning and four notes will be emitted.<br>
<br><div class="gmail_quote">On Mon, Aug 13, 2012 at 10:22 PM, Ted Kremenek <span dir="ltr"><<a href="mailto:kremenek@apple.com" target="_blank">kremenek@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word"><div>Hi Richard,</div><div><br></div><div>Thanks for making great progress on this! I agree with Richard Smith's suggestion about not using an std::map, but is there a reason to not just use a DenseMap to see if there are any enum collisions? Fundamentally your algorithm is O(n*log n), not O(n). Since you are converted everything to an int64_t, could you not just use a DenseMap<int64_t, EnumConstantDecl*>? You could then iterate over the EnumConstantDecls in one linear pass, converting them to int64_t's on the fly for comparison purposes. A few advantages I see are:</div>
<div><br></div><div>(1) No sorting, which is O(n log n)</div><div><br></div><div>(2) Diagnostics are based on declaration order without having to result to a stable_sort(), which is even more costly. If you want to report all duplicates as notes, the mapped value could easily be made into a PointerIntPair, where the "int" corresponds to an index into another vector where you collect the duplicates. Since the number of cases where duplicates will exist is few, you're optimizing for the common case.</div>
<div><br></div></div></blockquote><div><br></div><div>I think it's best to back up a little in the loop.</div><div><br></div><div> + llvm::APSInt Value = (*I)->getInitVal();</div><div><br></div><div>At this point, a valid enum is found for the warning, which is stored in I. Now, from here to the end of this iteration of the for-loop, process every enum which has the same Value.</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div></div><div>With the current approach, I'm not certain, but something about the following looks curious:</div>
<div><br></div><div><blockquote type="cite"><div>+ // Find the first duplicate enum.</div><div>+ llvm::SmallVector<EnumConstantDecl*, 10>::iterator Iter = I + 1;</div><div>+ for (; Iter != E && llvm::APSInt::isSameValue(Value, (*Iter)->getInitVal());</div>
<div>+ ++Iter) {</div><div>+ if (ValidDuplicateEnum(*Iter, Enum))</div><div>+ break;</div><div>+ }</div><div>+</div><div><br></div></blockquote></div></div></blockquote><div>This loop is similar to the next loop. Before emitting the warning, check that there exists a enum to put into a loop. If an enum constant is found, break, emitting the warning, then start the loop again, emitting notes.</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><blockquote type="cite"><div></div></blockquote></div><div><br></div><div>If the array is sorted by constant value, I don't believe there is any need to keep marching through the rest of the items if the next item isn't a match. I may be mistaken, but I think this leaves your implementation still being quadratic in the worst case. </div>
</div></blockquote><div>All the enums of the same value is handled at one go. At the end, I is set to Iter so that the next iteration will skip all same value enums and starts at the next value. This keeps the vector transverse at linear time.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div>Further, the following:</div><div><br></div><div><blockquote type="cite">
<div>+</div><div>+ if (Iter == E)</div><div>+ break;</div><div>+</div></blockquote></div></div></blockquote><div>From the previous loop, there's three ways to exit the loop.</div><div>1) Iter == E is true, every enum constant from I to the end have the same value, but there's not a valid enum for the diagnostic. We can just quit checking at this point.</div>
<div>2) isSameValue is false, Iter is a different value than I, so I becomes Iter for the next iteration of the outer for loop.</div><div>3) ValidDuplicateEnum is true, emit the warning, start the next loop and start emitting notes</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div></div><div>appears to break out of the entire comparison entirely. Isn't it possible for Iter == E if no match was found in the preceding loop? For example, from your test cases, I'd expect to see a warning here:</div>
<div><br></div><div><blockquote type="cite"><div>+</div><div>+enum {</div><div>+ I1 = -1,</div><div>+ I2,</div><div>+ I3,</div><div>+ I4 = -2,</div><div>+ I5,</div><div>+ I6</div><div>+};</div></blockquote><br></div>
<div>Is there a reason why I1 and I5 are not reported as duplicates? (or I3 and I6)?</div></div></blockquote><div><br></div><div>Anonymous enums are no longer warned on. Some code likes to dump all the constants into a single anonymous enum. Looks like I forgot to add this feature to the list of changes.</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><br></div><div>At a high level, I honestly find this logic to be more complicated than I would have expected. The sorting seems unnecessary, and will report diagnostics in an unnatural order (first based on enum constant value, then on declaration order). A straight linear pass seems more naturally to me, and DenseMap is very efficient.</div>
</div></blockquote><div>Is there a comparison between the different containers in LLVM and the STL containers? Also, will the DenseMap fix the unnatural order of the diagnostics?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word"><div> If you are worried about the malloc() traffic, you could always keep a DenseMap around in Sema just for this warning, and clear it out whenever entering CheckForDuplicateEnumValues(). You then pay the malloc() cost when you need a bigger map, which will quickly amortize the cost.</div>
</div></blockquote><div>Seems a little strange to keep around a data structure in Sema for a non-default diagnostic.</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word"><div><br></div><div>I honestly could be missing something here, so I apologize if I just am not reading the code correctly.</div></div></blockquote><div>I think I have addressed any misunderstandings inline. I'll have a look at DenseMap and see if it works better. </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><br></div><div>Cheers,</div><div>Ted</div></div></blockquote></div><br>