[PATCH] D23749: [NVPTX] Add NVPTXHoistAddrSpaceCast pass.

Sun Aug 21 12:25:21 PDT 2016

> If you want, you could treat this as a data flow problem (treat loads/stores/cmpxchgs/atomicrmws of your addrspacecast'd GEP as sources, propagate that information across CFG edges, etc.).

IIUC I'd still want to propagate upwards (in a limited sense), because
if we only propagate downwards from (say) loads, then we can't
transform something like this at all.

  %p = ...
  %gep = getelementptr %p
  %asc1 = asc %p
  %asc2 = asc %gep
  load %asc1
  load %asc2
  ret

I'm starting to understand Jingyue's suggestion that we can only
propagate up if there are no side-effecting operations in the way.
But this is limiting, because even loads can have a side-effect.  This
means we can't hoist an asc above a load, unless we know it's
dereferenceable.  So then given

  %p = ...
  %gep = gep %p
  load %x  ; not known dereferenceable
  %asc =  asc %gep
  load %asc

we can't hoist asc.  (Even though it's perfectly safe if nobody else uses %gep.)

Some options I can think of:

1) Push the addrspacecasts up, but don't RAUW.  That is, we make a
copy of the original gep and leave it alone -- only users of the old
asc get to use the new gep.  But this could lead us to doing
essentially two parallel sets of address computations in the function;
that seems bad.

2) Teach scev that gep and asc commute on some targets.  That sounds
scary and hard, and it would only benefit passes that use scev.  But
it would fix the load/store vectorizer, which is what I care about.

It also would mean that, if the load/store vectorizer (or some other
scev-based pass) doesn't fire, we will end up emitting more address
space casts than we'd like (e.g. in the first example, ideally we'd
emit just one asc).  I'm not sure this is a problem on nvptx, because
I have it in my head that the generic --> global asc is a nop [1].

3) Improve address space inference so that it can eliminate these
casts during its normal course of business.  That helps us iff we can
infer the addrspace of the base pointer.  Currently we're pretty good
at this inference for arguments to kernels, but much less good for
arguments to device functions.  But we could write an IPO that lets us
infer the types of device function args, and this would be beneficial
in itself.  Then, presumably, we'd be able to eliminate most of these
casts as redundant, and scev could go to town during the lsv.

I kind of lean towards (3).  What do you think?

-Justin

[1] As evidence, __ldg is implemented in the CUDA headers as inline
asm that takes a pointer and calls ld.global.nc. It doesn't check that
the pointer is in the global address space or do any conversions
afaict.

On Sun, Aug 21, 2016 at 12:05 AM, David Majnemer
<david.majnemer at gmail.com> wrote:
>
>
> On Sat, Aug 20, 2016 at 9:01 PM, Justin Lebar <jlebar at google.com> wrote:
>>
>> Matt, I thought about making this pass generic, but as written it doesn't
>> make much sense if you don't follow it with the equivalent of the NVPTX
>> address space inference pass.
>>
>> David, I will meditate on this...
>
>
> If you want, you could treat this as a data flow problem (treat
> loads/stores/cmpxchgs/atomicrmws of your addrspacecast'd GEP as sources,
> propagate that information across CFG edges, etc.).  Then you can know if a
> block is dominated by blocks where the transform is known to be safe.
>
>>
>>
>> On Aug 20, 2016 8:40 PM, "Matt Arsenault" <Matthew.Arsenault at amd.com>
>> wrote:
>>>
>>> arsenm added a subscriber: arsenm.
>>>
>>> ================
>>> Comment at: llvm/lib/Target/NVPTX/NVPTXHoistAddrSpaceCast.cpp:156-160
>>> @@ +155,7 @@
>>> +    if (AddrSpaceCastInst *ASC = dyn_cast<AddrSpaceCastInst>(&I))
>>> +      if (ASC->getSrcAddressSpace() ==
>>> AddressSpace::ADDRESS_SPACE_GENERIC &&
>>> +          ASC->getDestAddressSpace() !=
>>> AddressSpace::ADDRESS_SPACE_GENERIC &&
>>> +          ASC->getSrcTy()->getPointerElementType() ==
>>> +              ASC->getDestTy()->getPointerElementType())
>>> +        Worklist.push_back(ASC);
>>> +
>>> ----------------
>>> Can you put this into a TTI hook?
>>>
>>>
>>> https://reviews.llvm.org/D23749
>>>
>>>
>>>
>