<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Firstly, thanks for looking at this.  I took a look a couple of years ago but quickly fell in to patch hell with way too many patches all over the compiler!<div class=""><br class=""></div><div class="">I did manage to find a few of my changes, and I ultimately focused on specific APIs which looked expensive.  For example, AttributeSet::addAttributes, or even just code which looks and feels like its examining internal state, like AttributeSet::getNumSlots().<br class=""><div><blockquote type="cite" class=""><div class="">On Mar 20, 2017, at 9:48 AM, Reid Kleckner via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class="">LLVM's Attribute APIs need an overhaul.</div><div class=""><br class=""></div><div class="">Current problems</div><div class="">================</div><div class=""><br class=""></div><div class="">First, testing for an attribute on an Argument is slow.</div><div class="">llvm::AttributeSet::getAttributes(int) consumed 2% of cycles while optimizing</div><div class="">llc during LTO. Our mid-level optimizations are constantly asking if a given</div><div class="">argument has some attribute (nonnull, dereferencable, etc), and this is</div><div class="">currently linear in the size of the function prototype.  This should be constant</div><div class="">time.</div></div></div></blockquote>My guess is that arguments having attributes is still a relatively new thing, so they have just outgrown the implementation.  Back when this was written i don’t think we were nearly as aggressive about adding or querying things like non null on arguments so the cost of the implementation wasn’t really clear.<br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><br class=""></div><div class="">Adding and removing individual attributes is also inefficient. Every single</div><div class="">attribute addition or removal convenience method, for example, inserts a new</div><div class="">AttributeSet into a FoldingSet. AttributeFuncs::mergeAttributesForInlining is</div><div class="">written in terms of these APIs, and is therefore quite inefficient. This also</div><div class="">shows up during LTO. I don't think it's practical to remove these inherently</div><div class="">expensive APIs, but if we make AttributeBuilder easier to use, it will be a lot</div><div class="">easier to fix these problems as they come up.</div></div></div></blockquote>I think the worst offender is AttributeSet::addAttributes where we have to build intermediate sets just to add them to a parent set.</div><div><br class=""></div><div>If you completely flatten the list so that return, function, and arguments all have their own, as you suggest later, then I think you’ll just happen to fix most of the slow construction APIs.  As you’ve mentioned AttrBuilder, that probably will handle most of the remaining construction cases efficiently.</div><div><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><br class=""></div><div class="">Lastly, the Attribute APIs are hard to use because they do too much information</div><div class="">hiding. In particular, the choice to make AttributeSetNode an internal</div><div class="">implementation detail of lib/IR is problematic. This type describes all of the</div><div class="">attributes on an individual argument, return value, or function, which IPO</div><div class="">transforms often want. Today the getFnAttributes, getRetAttributes, and</div><div class="">getParamAttributes APIs find the relevant AttributeSetNode* and wrap it in a</div><div class="">new, uniqued AttributeListImpl. This requires callers to keep around the index</div><div class="">of the extracted attributes so they can look through the wrapper list. If we</div><div class="">make AttributeSetNode public, we can simplify a lot of IPO transform code.</div></div></div></blockquote>Do you have a particular IPO spot where this looks bad today?  I’d be interested in taking a look to see whether flattening them out just works here too?<br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><br class=""></div><div class="">Naming</div><div class="">======</div><div class=""><br class=""></div><div class="">The naming of today's APIs is confusing. I'll try to explain what the current</div><div class="">important classes are.</div><div class=""><br class=""></div><div class="">- AttributeSet: This is a uniqued, ordered list of sets of attributes, and is</div><div class="">  associated with a function or call prototype. It is stored on Function,</div><div class="">  CallInst, and InvokeInst. It is a smart pointer value type that wraps</div><div class="">  AttributeSetImpl, which contains the actual storage.</div><div class=""><br class=""></div><div class="">- AttributeSetImpl: The private implementation of AttributeSet. Owned by the</div><div class="">  LLVMContext. Today this is a vector of pairs of attribute indices and</div><div class="">  AttributeSetNode pointers.</div><div class=""><br class=""></div><div class="">- AttributeSetNode: This is an ordered, uniqued set of Attributes that might</div><div class="">  apply to a single function, callee, return value, parameter, or argument.  It</div><div class="">  uses TrailingObjects to store the attributes, and until Jan 2016, tested for</div><div class="">  attribute presence by a linear scan. Matthias Braum added a bitset to speed up</div><div class="">  tests in r259251.</div><div class=""><br class=""></div><div class="">- AttributeBuilder: A mutable representation of an AttributeSetNode. Used for</div><div class="">  efficiently building a collection of attributes before freezing it into an</div><div class="">  AttributeSetNode.</div><div class=""><br class=""></div><div class="">- Attribute: Pointer wrapping an AttributeImpl.</div><div class=""><br class=""></div><div class="">- AttributeImpl: Polymorphic base class of StringAttributeImpl,</div><div class="">  EnumAttributeImpl, and IntAttributeImpl. Enums have the attribute kind,</div><div class="">  integers have a uint64_t value, and strings have two StringRefs for the kind</div><div class="">  and value.</div><div class=""><br class=""></div><div class="">AttributeSet doesn't seem like a good name to me. In the past it was called</div><div class="">AttrListPtr and PAListPtr. Today's AttributeSetImpl was called</div><div class="">"ParameterAttributeList", which is why we have "PAL" local variables.  I'd like</div><div class="">to rename AttributeSet to just "AttributeList".  It's a list of sets of</div><div class="">attributes that is parallel to some function prototype. I already have a patch</div><div class="">out for this here: <a href="https://reviews.llvm.org/D31102" class="">https://reviews.llvm.org/D31102</a></div></div></div></blockquote>Thats where PAL came from!  I always wondered.<br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><br class=""></div><div class="">The natural second thing to do is to rename AttributeSetNode to AttributeSet, or</div><div class="">create a pointer wrapper type called AttributeSet and rename today's</div><div class="">AttributeSetNode to AttributeSetImpl. It is inherently an ordered set of</div><div class="">attributes. I also propose to make this type public, as described earlier.</div></div></div></blockquote>Whether as an intermediate step, or the final naming, this all seems fine to me.<br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><br class=""></div><div class="">Optimizations</div><div class="">=============</div><div class=""><br class=""></div><div class="">Testing for presence of an attribute on a parameter should be fast. Today it is</div><div class="">linear in the size of the prototype. I propose that we change AttributeList to</div><div class="">make attributes randomly accessible by slot index, so that it stores a trailing</div><div class="">object array of AttributeSetNode*, or some equivalent type that makes it</div><div class="">efficient to test for attribute presence. I'll do some benchmarking to show that</div><div class="">this change doesn't measurably impact LTO memory usage.</div></div></div></blockquote>At the very least here, it seems like we should have Argument’s store their argument number in the Value subclass data or something like that.  </div><div><br class=""></div><div>uint64_t Argument::getDereferenceableOrNullBytes() const {<br class="">  assert(getType()->isPointerTy() &&<br class="">         "Only pointers have dereferenceable bytes");<br class="">  return getParent()->getDereferenceableOrNullBytes(getArgNo()+1);<br class="">}</div><div><br class=""></div><div>This is horrible once you realize that getArgNo() is a linear scan over the arguments then getDereferenceableOrNullBytes() is another linear scan over the slots.  Worst case 2n just to check a simple value!<br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><br class=""></div><div class="">I also want to reduce the number of loads needed to test a single attribute.</div><div class="">One potential optimization is to move the bitset of enum attributes up out of</div><div class="">AttributeSetNode* into a wrapper type, perhaps called AttributeSet. The wrapper</div><div class="">type could be a bitset and pointer pair, or we could try to union those together</div><div class="">in the usual way to save memory. I suspect the memory savings of the union are</div><div class="">unimportant, but I will measure. If we can always store the enumerated</div><div class="">attributes that have no associated size locally, that seems ideal, since it</div><div class="">allows modifying them without copying the immutable data stored in the LLVM</div><div class="">context.</div><div class=""><br class=""></div><div class="">Another idea is to eliminate the use of AttributeList completely from Function,</div><div class="">and instead store the attributes of an Argument directly *in* the Argument. The</div><div class="">function and return attributes would live on the Function. At the end of the</div><div class="">day, `Arg->hasNonNullAttr()` should compile to a load and a bittest. This is a</div><div class="">more invasive change that would require more API migration, but again it would</div><div class="">avoid more FoldingSet usage.</div></div></div></blockquote>You may get lucky in terms of this change as most of the queries are already on the argument or the function and can be changed internally to their hasNonNullAttr (or whatever) methods.</div><div><br class=""></div><div>Any of the approaches here sounds fine, and its looks like you’re going to measure anyway.  I think you’ve already observed that we query these far more than we construct them, so i’d go for the ‘load and check a bit’ approach.  We want that query to be as fast as possible.  Even if it means putting a pointer on every Argument, I expect we have a high enough proportion of Argument’s with attributes at this point that its likely not too bad.<br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><br class=""></div><div class="">The last two optimizations taken together are particularly nice, because it</div><div class="">means that changing enum attributes on functions and arguments won't trigger</div><div class="">copies of immutable objects. It will only involve flipping bits in a locally</div><div class="">stored bitset.</div><div class=""><br class=""></div><div class="">Separately, I want to collapse dereferenceable(N), nonnull, and</div><div class="">dereferenceable_or_null(N) for efficiency and orthogonality. I will probably</div><div class="">raise this in a separate RFC, but I think it was a mistake to have to attribute</div><div class="">imply nonnull. I'm mostly looking at this from an efficiency and redundancy</div><div class="">perspective, though, not a semantic one.</div><div class=""><br class=""></div><div class="">----------</div><div class=""><br class=""></div><div class="">Does this all seem reasonable? Speak up now if you think any of this looks like</div><div class="">a step in the wrong direction.</div></div></div></blockquote>Seems good to me.</div><div><br class=""></div><div>Thanks again for doing this!</div><div>Pete<br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><br class=""></div><div class=""><br class=""></div><div class="">Notes:</div><div class=""><br class=""></div><div class="">RFC Overhauling Attributes, Bill Wending Sept 2012</div><div class=""><a href="https://groups.google.com/forum/#!topic/llvm-dev/RnZVfYmmkMc" class="">https://groups.google.com/forum/#!topic/llvm-dev/RnZVfYmmkMc</a></div><div class="">User struggle</div><div class=""><a href="http://lists.llvm.org/pipermail/llvm-dev/2013-January/058822.html" class="">http://lists.llvm.org/pipermail/llvm-dev/2013-January/058822.html</a></div><div class="">dereferenceable vs dereferenceable_or_null split</div><div class=""><a href="http://thread.gmane.org/gmane.comp.compilers.llvm.devel/82186" class="">http://thread.gmane.org/gmane.comp.compilers.llvm.devel/82186</a></div><div class=""><br class=""></div></div>

_______________________________________________<br class="">LLVM Developers mailing list<br class=""><a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a><br class="">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<br class=""></div></blockquote></div><br class=""></div></body></html>