[LLVMdev] RFC: SDNode Flags

Mon Aug 3 10:38:47 PDT 2009

On Aug 3, 2009, at 9:30 AM, David Greene wrote:

> On Saturday 01 August 2009 15:12, Dan Gohman wrote:
>
>> LoadSDNode, which inherits from MemSDNode is the largest
>> SDNode.  With the current SDNode allocation strategy, making it
>> bigger will increase the allocation needed for all nodes.
>
> Ok.
>
>
>>> new (N) LoadSDNode(..., isVolatile|isNonTemporal);
>>>
>>> Thoughts?
>>
>> This sounds reasonable.  I'd suggest using names like MemRefFlags  
>> rather
>> than isVolatileisNonTemporal, but that's just a detail :-).
>
> There's a pipe there.  :)

Oops. I blame my font ;-).

>
>> My biggest concern is how this is encoded in the SDNode.  It'd be
>> good to avoid making MemSDNode bigger, but there are a variety of  
>> ways
>> that exiting bits can be made available.  NodeType doesn't need all  
>> 16
>> of
>> its bits, for example, and OperandsNeedDelete could be merged with
>> OperandList with a PointerIntPair if needed.
>
> *shudder*  Undefined behavior?  No thanks.

What undefined behavior?  Is it PointerIntPair that's making you  
shudder?
That's implementation-defined behavior. Very different :-).  And it's  
checked
by asserts.

>
> Right now, the lower five bits of SubclassData are used to encode  
> various
> things for memory SDNodes.  One of those is the volatile bit.  This  
> leaves
> 10 bits for alignment information, meaning we can represent  
> alignments up to
> 2^1024, which seems like a bit much.  :)
>
> What do you think about carving four more bits out of the alignment  
> field
> so we have five flag/semantic bits and can represent alignments up  
> to 2^64?
> That seems like plenty of alignment headroom.

Yes, that's fine too.  If someone really wanted an alignment greater
than 1<<64, they would probably have other issues that would dwarf
this one.

>
>> How are you representing movnt in LLVM IR, with an intrinsic?
>
> Yes, it's an intrinsic inserted just before the load or store of  
> interest.
> I'm not particularly happy with this solution because I'm not sure all
> transformation passes will preserve the ordering of
> intrinsic-followed-by-operation.  Is there a better way to do this?

Crazy idea: how about an intrinsic just after the load or store of  
interest?
It would be a sort of opposite of prefetch, saying "i'm not going to be
using the memory at this address for a while".   It could use
IntrWriteArgMem like the regular prefetch does, which is usually
sufficient to keep it from wandering off, though it is a little  
stronger than
is strictly necessary.

The other alternative is something like llvm.x86.sse.movnt.ps and
friends.

Dan