[PATCH] D65600: Relax load store vectorizer pointer strip checks

Stanislav Mekhanoshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Aug 1 15:05:04 PDT 2019


rampitec marked an inline comment as done.
rampitec added inline comments.


================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:347-348
 
-  if (PtrA->getType()->getPointerAddressSpace() != PtrAS ||
-      PtrB->getType()->getPointerAddressSpace() != PtrAS)
+  if (DL.getTypeStoreSizeInBits(PtrA->getType()) != PtrBitWidth ||
+      DL.getTypeStoreSizeInBits(PtrB->getType()) != PtrBitWidth)
     return false;
----------------
tra wrote:
> rampitec wrote:
> > tra wrote:
> > > rampitec wrote:
> > > > tra wrote:
> > > > > If we were to use addrspacecast(1) instead of 5 in the example below, we'd proceed with the checks.
> > > > > If we can deal with non-generic address spaces in principle, why can't we deal with address spaces that differ in pointer size? I'd assume that logic that determines consecutiveness should work the same.
> > > > > 
> > > > > I guess one way to handle mismatched address spaces would be to normalize the pointer to the common address space (generic?) and then run the checks for the consecutiveness.
> > > > > 
> > > > > 
> > > > In the meanwhile I am working on the followup patch to handle pointer size differences in graceful way. That requires more code and time and time though.
> > > The reason SCEV is unhappy is that we're asking it to do something with different types. Bailing out early will avoid the problem, but it's still too conservative, IMO. We may still be able to do useful optimizations with such pointers.
> > > Perhaps something like this might keep things working where possible until we have a better way: https://reviews.llvm.org/differential/diff/212903/
> > > 
> > > 
> > Look, as i said I am working on the more relaxed patch to handle it. I am just trying to think about the corner cases, for example like this:
> > 
> > ```
> > target datalayout = "e-p:64:64-p1:64:64-p5:32:32"
> > 
> > define amdgpu_kernel void @ext_ptr_overflow_unsigned(i32 addrspace(5)* %p) {
> > entry:
> >   %gep1 = getelementptr inbounds i32, i32 addrspace(5)* %p, i64 4294967297
> >   %gep2 = getelementptr inbounds i32, i32 addrspace(5)* %p, i64 4294967298
> >   %a.ascast = addrspacecast i32 addrspace(5)* %gep1 to i32*
> >   %b.ascast = addrspacecast i32 addrspace(5)* %gep2 to i32*
> >   %tmp1 = load i32, i32* %a.ascast, align 8
> >   %tmp2 = load i32, i32* %b.ascast, align 8
> >   unreachable
> > }
> > 
> > ```
> Fair enough. Let's land your patch as is to unblock things for now. 
Looks like stripAndAccumulateInBoundsConstantOffsets() takes care of it, so offset always fits a smallest data type in chain.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65600/new/

https://reviews.llvm.org/D65600





More information about the llvm-commits mailing list