[llvm-dev] [RFC] Attribute overhaul 2

Fri Apr 7 16:08:52 PDT 2017

Getting back to this after a two week hiatus...

On Sat, Mar 25, 2017 at 3:17 PM, Philip Reames <listmail at philipreames.com>
wrote:

> Reid,
>
> Thank for you taking a close look at this area.  Attributes definitely
> need a good amount of love.  As you point out, the naming of some of the
> code involved is more than a bit confusing and overly abstracted.  However,
> there's also a delicate balance here in terms of having a generic
> understandable API.  In general, I really like your proposal, a few
> comments inline.

Thanks!

The natural second thing to do is to rename AttributeSetNode to
>> AttributeSet, or
>> create a pointer wrapper type called AttributeSet and rename today's
>> AttributeSetNode to AttributeSetImpl. It is inherently an ordered set of
>> attributes. I also propose to make this type public, as described earlier.
>>
> SGTM.  From an API design perspective, I'd absolutely love to see
> AttributeSet die.  The fact that it represents both the set of attributes
> on a particular operand and the set of attributes across *all* operands is
> needlessly confusing.  (Note: only in the API, the implementation is a bit
> more sane.)   I really like your suggestion to rename the current
> AttributeSet to something like AttributeList and publishing the existing
> AttributeSetNode.   Once we've done that, we can update the bulk add/remove
> APIs to be sane.  The current semantics are an utter mess.

Yep. AttributeLists shouldn't be a list of more AttributeLists. I've got a
patch out which tries to untangle that a little bit more:
https://reviews.llvm.org/D31198

With that patch, getParamAttributes(unsigned) returns an AttributeSetNode*
instead of an AttributeList.

>> Optimizations
>> =============
>>
>> Testing for presence of an attribute on a parameter should be fast. Today
>> it is
>> linear in the size of the prototype. I propose that we change
>> AttributeList to
>> make attributes randomly accessible by slot index, so that it stores a
>> trailing
>> object array of AttributeSetNode*, or some equivalent type that makes it
>> efficient to test for attribute presence. I'll do some benchmarking to
>> show that
>> this change doesn't measurably impact LTO memory usage.
>>
> Sounds like a generally good idea.  Maybe we can hide the indexing from
> the client though?  Do something like store the index in the Value itself
> and have the AttributeSet (current name) interface be in terms of the Use?
> Not a strong requirement or anything, just thinking that might keep the
> interface a bit cleaner.

Huh, I hadn't considered building the API in terms of Use& or Use*. That
would greatly reduce the need for all these extra integer induction
variables and let us use a lot more range-based for loops. I think we can
compute operand # from Use* in constant time, right? Thanks for the idea!

>
> I also want to reduce the number of loads needed to test a single
>> attribute.
>> One potential optimization is to move the bitset of enum attributes up
>> out of
>> AttributeSetNode* into a wrapper type, perhaps called AttributeSet. The
>> wrapper
>> type could be a bitset and pointer pair, or we could try to union those
>> together
>> in the usual way to save memory. I suspect the memory savings of the
>> union are
>> unimportant, but I will measure. If we can always store the enumerated
>> attributes that have no associated size locally, that seems ideal, since
>> it
>> allows modifying them without copying the immutable data stored in the
>> LLVM
>> context.
>>
> I got lost here.  Might be good to land some of the renames and then
> restate this.  I think we may be confusing meanings of AttributeSet?

Yeah, I was trying to describe a *new* AttributeSet type. It would probably
look something like:

class AttributeSet {
  uint64_t EnumAttrs; // bitset of that can answer hasAttr with a load and
test
  AttributeSetNode *SlowAttrs; // string and integer attributes
...
};

This would be a trivially copyable value type, and would describe the
attributes of a single function, arg, or ret. AttributeList would become an
array of these. Adding an enum attribute is very cheap now: copy 16 (or 12)
bytes of data and set a bit.

Another idea is to eliminate the use of AttributeList completely from
>> Function,
>> and instead store the attributes of an Argument directly *in* the
>> Argument. The
>> function and return attributes would live on the Function. At the end of
>> the
>> day, `Arg->hasNonNullAttr()` should compile to a load and a bittest. This
>> is a
>> more invasive change that would require more API migration, but again it
>> would
>> avoid more FoldingSet usage.
>>
> I particularly like this.  It's worth pointing out that we can do the same
> for return values of Invokes and Calls, but that handling parameter
> attributes is much harder.  This is also the point where I'd point out that
> metadata and return attributes have grown to heavily overlap in semantic
> with distinct sets of instructions. Maybe it's time to common the
> implementation, even if the interfaces stay separate?  Once we'd done that,
> we'd have three cases: return attributes/metadata on all instructions,
> parameter attributes on Call and Invoke, and Argument attributes on
> Functions.

As nice as this is, I think I've convinced myself to stop short of doing
this. It's nice if Function, CallInst, and InvokeInst all use the same data
structures to store attributes. As long as Arg->hasAttr(Enum) is O(1), I
think we've won. Moving the data out of the parallel array and into
Argument probably isn't necessary.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170407/b309e9ca/attachment.html>