<html><head><meta http-equiv="Content-Type" content="text/html charset=iso-8859-1"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>Hi Richard,</div><div><br></div><div>If I read these numbers correctly, the hash table algorithm (with O(n) performance) takes about 1.6-2% percent more than the control for runs 1-3, and hardly anything noticeable for the clang code base. Were runs 1-3 used in your earlier measurements, where the sorting-based approach took about ~4% longer (or is that not the correct number)? Just wanted an idea of where we are compared to the earlier measurements.</div><div><br></div><div>Ted</div><br><div><div>On Aug 28, 2012, at 3:53 PM, Richard Trieu <<a href="mailto:rtrieu@google.com">rtrieu@google.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">Timing information:<div>Three clangs were run, clang with no changes (control), duplicate enum with PointerUnion (most recent patch), duplicate enum with DenseMap without PointerUnion (next most recent patch). Each run with -fsyntax-only and -Wduplicate-enum for modified clangs. Runs 1, 2, and 3 are files with only enums. Run 4 is a preprocessed Clang.</div>
<div><br></div><div>Key:</div><div>name: Average (Min-Max)</div><div><br></div><div>Run1:</div><div>Control: 13.763 (13.66-14.14)</div><div>PointerUnion: 14.046 (13.94-14.16)</div><div>DenseMap: 14.304 (14.24-14.39)</div>
<div><br></div><div><div>Run2:</div><div>Control: 20.189 (20.1-20.31)</div><div>PointerUnion: 20.514 (20.37-20.6)</div><div>DenseMap: 20.635 (20.56-20.7)</div><div><br></div><div>Run3:</div><div>Control: 26.715 (26.66-26.8)</div>
<div>PointerUnion: 26.928 (26.8-27.12)</div><div>DenseMap: 27.13 (27.05-27.22)</div><div><br></div><div><div>Run4:</div><div>Control: 29.686 (28.98-30.39)</div><div>PointerUnion: 29.706 (28.73-30.69)</div><div>DenseMap: 29.952 (29.3-30.63)</div>
</div><div><br></div><div class="gmail_quote">On Tue, Aug 28, 2012 at 2:12 PM, Ted Kremenek <span dir="ltr"><<a href="mailto:kremenek@apple.com" target="_blank">kremenek@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word">How fast? (particularly compared to the first implementation, and relative to -fsyntax-only)<div><div class="h5"><br><div><div>On Aug 28, 2012, at 11:48 AM, Richard Trieu <<a href="mailto:rtrieu@google.com" target="_blank">rtrieu@google.com</a>> wrote:</div>
<br><blockquote type="cite">New patch with PointerUnion and DenseMap is slightly faster than the previous DenseMap patch.<br><br><div class="gmail_quote">On Mon, Aug 27, 2012 at 9:51 PM, Ted Kremenek <span dir="ltr"><<a href="mailto:kremenek@apple.com" target="_blank">kremenek@apple.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word">Thanks! Quick question before I review it in more details: what is the performance characteristics of this patch compared to the others?<div>
<br><div><div><div><div>On Aug 27, 2012, at 11:35 AM, Richard Trieu <<a href="mailto:rtrieu@google.com" target="_blank">rtrieu@google.com</a>> wrote:</div><br></div><blockquote type="cite"><div>
<div>Incorporated most of the suggestions into this patch. Still using a double pass over the constants for the reasons outlined below.<br><br><div class="gmail_quote">On Fri, Aug 17, 2012 at 10:00 PM, Ted Kremenek <span dir="ltr"><<a href="mailto:kremenek@apple.com" target="_blank">kremenek@apple.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word">BTW, I wrote this two days ago. For some reason my mail client didn't send it out until now. My apologies for the delay.</div>
</blockquote><div>No worries. Had some issues that cropped up on template diffing that took my time. </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word">
<div><br><div><div><div>On Aug 15, 2012, at 10:11 PM, Ted Kremenek <<a href="mailto:kremenek@apple.com" target="_blank">kremenek@apple.com</a>> wrote:</div><br></div><blockquote type="cite">
<div><div style="word-wrap:break-word"><div><div>On Aug 15, 2012, at 6:12 PM, Richard Trieu <<a href="mailto:rtrieu@google.com" target="_blank">rtrieu@google.com</a>> wrote:</div><br><blockquote type="cite">
<div class="gmail_quote">On Tue, Aug 14, 2012 at 9:48 PM, Ted Kremenek <span dir="ltr"><<a href="mailto:kremenek@apple.com" target="_blank">kremenek@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word"><div><div>On Aug 14, 2012, at 2:32 PM, Richard Trieu <<a href="mailto:rtrieu@google.com" target="_blank">rtrieu@google.com</a>> wrote:</div><br><blockquote type="cite">
<blockquote class="gmail_quote" style="font-family:Helvetica;font-size:medium;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:-webkit-auto;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div style="word-wrap:break-word">At a high level, I honestly find this logic to be more complicated than I would have expected. The sorting seems unnecessary, and will report diagnostics in an unnatural order (first based on enum constant value, then on declaration order). A straight linear pass seems more naturally to me, and DenseMap is very efficient.</div>
</blockquote><div style="font-family:Helvetica;font-size:medium;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:-webkit-auto;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
Is there a comparison between the different containers in LLVM and the STL containers?</div></blockquote><br></div><div>This is a reasonable place to start:</div><div><br></div><div> <a href="http://llvm.org/docs/ProgrammersManual.html#ds_map" target="_blank">http://llvm.org/docs/ProgrammersManual.html#ds_map</a></div>
<div><br></div><div>The key with DenseMap is that it is probed hashtable. There is one big allocation for the entire table, instead of a bunch of buckets. When applicable, it can be very fast, and feels like the right data structure to use here.</div>
</div></blockquote></div><br><div>Duplicate enum detection, now with DenseMap. The DenseMap maps a int64_t to a vector pointer. 0 and 1 were special keys for the DenseMap, so two separate pointers special cased for them. The vectors pointers are stored in another vector in declaration order. One pass is made over the enums to find ones without initializers. These are used to create vectors. A second pass through the enums populates the vectors. Finally, a pass over the vector of vectors is used to generate all the warnings and notes.</div>
<div><br></div><div>Run time is fairly consistent with the sorted vector implementation, which is max %3 difference against control.</div>
<span><duplicate-enum-densemap.patch></span></blockquote></div><br><div>Thanks for working on this. My main concern is this patch now has a lot of unnecessary malloc() traffic, which will certainly slow it down. Comments inline:</div>
<div><br></div><div><blockquote type="cite"><div>+</div><div>+static int64_t GetInt64(const llvm::APSInt& Val) {</div><div>+ return Val.isSigned() ? Val.getSExtValue() : Val.getZExtValue();</div><div>+}</div><div>+</div>
<div>+struct DenseMapInfoint64_t {</div><div>+ static int64_t getEmptyKey() { return 0; }</div><div>+ static int64_t getTombstoneKey() { return 1; }</div><div>+ static unsigned getHashValue(const int64_t Val) {</div><div>
+ return (unsigned)(Val * 37);</div><div>+ }</div><div>+ static bool isEqual(const int64_t& LHS, const int64_t& RHS) {</div><div>+ return LHS == RHS;</div><div>+ }</div><div>+};</div></blockquote><div><br>
</div><div>This trait class doesn't look like it was actually used. The DenseMap below just uses the default trait for int64_t.</div><div><br></div><div>I also still think we can so something a bit smarter here. What I think we need to distinguish between is whether or not a constant has appeared more than once. We're saving a bit of memory on the keys, but spending that savings elsewhere when we allocate the vectors unconditionally for each constant.</div>
<br><blockquote type="cite"><div>+</div><div>+// Emits a warning when an element is implicitly set a value that</div><div>+// a previous element has already been set to.</div><div>+static void CheckForDuplicateEnumValues(Sema &S, Decl **Elements,</div>
<div>+ unsigned NumElements, EnumDecl *Enum,</div><div>+ QualType EnumType) {</div><div>+ if (S.Diags.getDiagnosticLevel(diag::warn_duplicate_enum_values,</div>
<div>+ Enum->getLocation()) ==</div><div>+ DiagnosticsEngine::Ignored)</div><div>+ return;</div><div>+ // Avoid anonymous enums</div><div>+ if (!Enum->getIdentifier())</div>
<div>+ return;</div><div>+</div><div>+ // Only check for small enums.</div><div>+ if (Enum->getNumPositiveBits() > 63 || Enum->getNumNegativeBits() > 64)</div><div>+ return;</div><div>+</div><div>+ typedef llvm::SmallVector<EnumConstantDecl*, 4> SameValueVector;</div>
<div>+ typedef llvm::DenseMap<int64_t, SameValueVector*> ValueToVectorMap;</div><div>+ typedef llvm::SmallVector<SameValueVector*, 10> DoubleVector;</div><div>+ ValueToVectorMap EnumMap;</div><div>+ DoubleVector EnumVector;</div>
<div>+ SameValueVector *ZeroVector = 0, *OneVector = 0;</div></blockquote><div><br></div><div>It took me a while to understand what this was doing, so I feel it could really benefit from a comment. This also appears to result in a ton of malloc traffic below. Here's my suggestion:</div>
<div><br></div><div> typedef llvm::SmallVector<EnumConstantDecl*, 3> ECDVector;</div><div> typedef llvm::SmallVector<ECDVector *, 3> DuplicatesVector;</div><div><br></div><div> typedef llvm::PointerUnion<EnumConstantDecl*, ECDVector *> DeclOrVector;</div>
<div> typedef llvm::DenseMap<int64_t, DeclOrVector> ValueToVectorMap;</div><div><br></div><div> DuplicatesVector DupVector;</div><div> ValueToVectorMap EnumMap;</div><div><br></div><div>The trick here is that the DenseMap maps from a constant to the first EnumConstantDecl it encounters. Only if we encounter a second EnumConstantDecl with the same enum value do we pay the cost of allocating another vector. This will drastically optimize in the common case, as calling malloc() is really slow. Right now the code appears to be doing a malloc() for every enum constant, which is going to really penalize us here.</div>
<br><blockquote type="cite"><div>+</div><div>+ for (unsigned i = 0; i < NumElements; ++i) {</div><div>+ EnumConstantDecl *ECD = cast<EnumConstantDecl>(Elements[i]);</div><div>+ if (!ECD) {</div><div>+ for (DoubleVector::iterator I = EnumVector.begin(), E = EnumVector.end();</div>
<div>+ I != E; ++I)</div><div>+ delete *I;</div><div>+ return;</div><div>+ }</div></blockquote><div><br></div><div>I don't quite understand this loop through DoubleVector here, but it looks like logic in case we want to return early and cleanup. Is there a case where the EnumConstantDecl can be null?</div>
</div></div></div></blockquote></div></div></div></blockquote><div>According to ActOnEnumBody, EnumConstantDecl is null if a diagnostic has previously been emitted for the constant. Since the enum is possibly ill-formed, skip checking it.</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><blockquote type="cite"><div style="word-wrap:break-word">
<div><br><blockquote type="cite"><div>+</div><div>+ if (ECD->getInitExpr())</div><div>+ continue;</div><div>+</div><div>+ int64_t Val = GetInt64(ECD->getInitVal());</div><div>+</div></blockquote><div><br>
</div><div>Looks good.</div><br><blockquote type="cite"><div>+ if (Val == 0) {</div><div>+ if (ZeroVector) continue;</div><div>+ ZeroVector = new SameValueVector();</div><div>+ ZeroVector->push_back(ECD);</div>
<div>+ EnumVector.push_back(ZeroVector);</div><div>+ } else if (Val == 1) {</div><div>+ if (OneVector) continue;</div><div>+ OneVector = new SameValueVector();</div><div>+ OneVector->push_back(ECD);</div>
<div>+ EnumVector.push_back(OneVector);</div><div>+ } else {</div><div>+ if (EnumMap.find(Val) != EnumMap.end())</div><div>+ continue;</div><div>+ SameValueVector *ValueVector = new SameValueVector();</div>
<div>+ ValueVector->push_back(ECD);</div><div>+ EnumVector.push_back(ValueVector);</div><div>+ EnumMap.insert(std::make_pair(Val, ValueVector));</div></blockquote><div><br></div><div>The "find()" followed by the "insert()" is wasteful. It results in two lookups to the hash table when we could have just used one. More on that later.</div>
<br><blockquote type="cite"><div>+ }</div><div>+ }</div></blockquote><div><br></div><div>IMO, this looks like a lot of complexity just to handle the fact that 0 and 1 are special values for the DenseMap. I don't really see this as the right tradeoff; the code is more complicated with marginal impact on memory usage or performance.</div>
<div><br></div><div>If you humor me for a bit, consider using something else for the key, e.g.:</div><div><br></div><div>struct DupKey {</div><div> int64_t val;</div><div> bool isTombstoneOrEmptyKey;</div><div>};</div>
<div>
<br></div><div>The idea is if 'isTombStoneOrEmptyKey' is true, we can use val = 0 or val = 1 to represent empty keys or tombstone entries. Otherwise, it's an int64_t, with the full range of values. We can define a DenseMap trait to do the right thing. Yes, this costs a tiny bit more in storage, but it allows the data structure to handle the complete set of values in your domain, instead of resorting to complicating the core algorithm. What I see here now is the same code essentially duplicated twice, which makes it harder to read and more error prone.</div>
<div><br></div><div>If we use DupKey as our key for the DenseMap, we can instead do something like this:</div><div><br></div><div> DeclOrVector &entry = EnumMap[Val]; // Use default construction of 'entry'.</div>
<div> // Is the first time we encountered this constant?</div><div> if (entry.isNull()) {</div><div> entry = ECD;</div><div> continue;</div><div> }</div><div> // Is this the second time we encountered this constant? If so,</div>
<div> // push the previous decl encountered and the one just encountered</div><div> // to a vector of duplicates.</div><div> if (EnumConstantDecl *D = entry.dyn_cast<EnumConstantDecl*>()) {</div><div> ECDVector *Vec = new ECDVector();</div>
<div> Vec->push_back(D);</div><div> Vec->push_back(ECD);</div><div> </div><div> // Update the entry to refer to the duplicates.</div><div> entry = Vec;</div><div><br></div><div> // Store the duplicates in a vector we can consult later for</div>
<div> // quick emission of diagnostics.</div><div> DupVector.push_back(Vec);</div><div><br></div><div> // On to the next constant.</div><div> continue;</div><div> }</div><div> // Is this the third (or greater) time we encountered the constant? If so,</div>
<div> // continue to add it to the existing vector.</div><div> ECDVector *Vec = entry.get<ECDVector*>();</div><div> Vec->push_back(ECD);</div><div><br></div><div><br></div><div>With this code, we only allocate memory (beyond the DenseMap) when we encounter a duplicate that would be worth reporting. In the common case, this savings in malloc traffic should be noticeable.</div>
<div><br></div><div>Notice also that I used:</div><div><br></div><div> DeclOrVector &entry = EnumMap[Val]; // Use default construction of 'entry'.</div><div><br></div><div>This results in a single lookup in the hashtable. Since we plan on adding a value for a key no matter what, by using this idiom we allow the DenseMap to default construct an entry if it doesn't exist. This results in a single hashtable lookup, from which we can modify the value in place. This is obviously faster than doing a hashtable lookup twice. </div>
<br><blockquote type="cite"><div>+</div><div>+ for (unsigned i = 0; i < NumElements; ++i) {</div><div>+ EnumConstantDecl *ECD = cast<EnumConstantDecl>(Elements[i]);</div><div>+ if (!ValidDuplicateEnum(ECD, Enum))</div>
<div>+ continue;</div><div>+</div><div>+ int64_t Val = GetInt64(ECD->getInitVal());</div><div>+</div><div>+ if (Val == 0) {</div><div>+ if (!ZeroVector || *ZeroVector->begin() == ECD)</div><div>+ continue;</div>
<div>+ ZeroVector->push_back(ECD);</div><div>+ } else if (Val == 1) {</div><div>+ if (!OneVector || *OneVector->begin() == ECD)</div><div>+ continue;</div><div>+ OneVector->push_back(ECD);</div>
<div>+ } else {</div><div>+ ValueToVectorMap::iterator I = EnumMap.find(Val);</div><div>+ if (I == EnumMap.end())</div><div>+ continue;</div><div>+ SameValueVector *V = I->second;</div><div>+ if (*V->begin() == ECD)</div>
<div>+ continue;</div><div>+ V->push_back(ECD);</div><div>+ }</div><div>+ }</div></blockquote><div><br></div><div>This second loop looks unnecessary. I think we can do everything we need to count duplicates with one loop. Of course the ValidDuplicateEnum() would need to be hoisted to the first loop.</div>
</div></div></blockquote></div></blockquote><div>Using two traverses allows two things to happen. One, the first element in the ECDVector will not have an initializer and will work with the warning. Otherwise, the vector needs to be searched for a proper enum constant to use. Two, it prevents unneeded creation of ECDVectors. If we have enum A { A1 = 2, A2 = 2, A3 = 1, A4 = 1, A5}; vectors for values 1 and 2 are created using a single pass while only a vector for 2 will be created using a double pass.</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><blockquote type="cite"><div><div style="word-wrap:break-word">
<div><br><blockquote type="cite"><div>+</div><div>+ for (DoubleVector::iterator DoubleVectorIter = EnumVector.begin(),</div><div>+ DoubleVectorEnd = EnumVector.end();</div><div>+ DoubleVectorIter != DoubleVectorEnd; ++DoubleVectorIter) {</div>
<div>+ SameValueVector *V = *DoubleVectorIter;</div><div>+ if (V->size() == 1)</div><div>+ continue;</div><div>+</div><div>+ SameValueVector::iterator I = V->begin();</div><div>+ S.Diag((*I)->getLocation(), diag::warn_duplicate_enum_values)</div>
<div>+ << (*I)->getName() << (*I)->getInitVal().toString(10)</div><div>+ << (*I)->getSourceRange();</div><div>+ ++I;</div><div>+ for (SameValueVector::iterator E = V->end(); I != E; ++I)</div>
<div>+ S.Diag((*I)->getLocation(), diag::note_duplicate_element)</div><div>+ << (*I)->getName() << (*I)->getInitVal().toString(10)</div><div>+ << (*I)->getSourceRange();</div>
<div>+ delete V;</div><div>+ }</div></blockquote><div><br></div><div><br></div><div>This is more or less the same, essentially it becomes:</div><div><br></div><div>for (DuplicateVector::iterator I = DupVector.begin(), E = DupVector.end(); I != E; ++I) {</div>
<div> ECDVector *Vec = *I;</div><div> // do the diagnostic logic ...</div><div> delete *I;</div><div>}</div><div><br></div><div>Note that with my suggestions the vector has size on order of the number of duplicate constants, not the number of total constants. If there are no duplicates, no work is required (including free'ing memory).</div>
<br><blockquote type="cite"><div>+}</div><div>+</div><div> void Sema::ActOnEnumBody(SourceLocation EnumLoc, SourceLocation LBraceLoc,</div><div> SourceLocation RBraceLoc, Decl *EnumDeclX,</div><div>
Decl **Elements, unsigned NumElements,</div><div>@@ -10709,6 +10868,7 @@</div><div> DeclsInPrototypeScope.push_back(Enum);</div><div> </div><div> CheckForUniqueEnumValues(*this, Elements, NumElements, Enum, EnumType);</div>
<div>+ CheckForDuplicateEnumValues(*this, Elements, NumElements, Enum, EnumType);</div><div> }</div><div> </div><div> Decl *Sema::ActOnFileScopeAsmDecl(Expr *expr,</div></blockquote><br></div><div>I know this may all be nit-picky, but I really think trying to reduce the malloc() traffic is worth looking at to get a real understanding of the performance improvement that can be found here.</div>
<div><br></div><div>Thanks for forging ahead on this.</div></div></div><div>_______________________________________________<br>cfe-commits mailing list<br><a href="mailto:cfe-commits@cs.uiuc.edu" target="_blank">cfe-commits@cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits</a><br></div></blockquote></div><br></div></blockquote></div><br>
</div></div><span><duplicate-enum-densemap2.patch></span></blockquote></div><br></div></div></div>
</blockquote></div><br>
</blockquote></div><br></div></div></div>
</blockquote></div><br></div>
</blockquote></div><br></body></html>