[PATCH] [AArch64] Match interleaved memory accesses into ldN/stN instructions.

Mon Jun 15 01:13:09 PDT 2015

Hi,

I've refactored the patch according to comments from Michael and Silviu.

As the ARM backend code is ready, I also added ARM code in this patch, so that we can compare and figure out how to share code between AArch64 backend and ARM backend.

As you can see, most of the static functions can be shared. But the problem is the code in matchInterleavedLoad()/matchInterleavedStore() can not be shared. The call of ldN and vldN intrinsics are different:

  AArch64 backend:
      %vec = tail call { <8 x i8>, <8 x i8> } @llvm.aarch64.neon.ld2.v8i8.p0v8i8(<8 x i8>* %ptr)
      call void @llvm.aarch64.neon.st2.v16i8.p0i8(<16 x i8> %A, <16 x i8> %B, i8* %ptr)
  ARM backend:
      %vec = tail call { <8 x i8>, <8 x i8> } @llvm.arm.neon.vld2.v8i8(i8* %ptr, i32 1)
      call void @llvm.arm.neon.vst2.v8i8(i8* %ptr, <8 x i8> %A, <8 x i8> %B, i32 1)

The ldN and vldN have different parameters. So It's hard to share the code about creating intrinsic calls. We can introduce new intrinsics like llvm.neon.ldN for both AArch64 and ARM, but it means we also need to modify all the code related to the old ldN/vldN intrinsics. Do you have any ideas about problem?

About the place, as Renato and James suggested, I think lib/CodeGen is a reasonable place. I think maybe we can put code in the lib/CodeGen/CodeGenPrepare.cpp, which already has a function called CodeGenPrepare::OptimizeShuffleVectorInst(ShuffleVectorInst *SVI). This function currently optimizes BroadcastShuffle for X86.

Thanks,
-Hao

================
Comment at: lib/Target/AArch64/AArch64InterleavedAccess.cpp:146-148
@@ +145,5 @@
+                               unsigned &Index) {
+  unsigned NumElts = Mask.size();
+  if (NumElts < 2)
+    return false;
+
----------------
mzolotukhin wrote:
> ```
> if (Mask.size() < 2)
>   return false;
> ```
> ?
Fixed.

================
Comment at: lib/Target/AArch64/AArch64InterleavedAccess.cpp:247
@@ +246,3 @@
+
+    unsigned Index;
+    if (!isDeInterleaveMaskOfFactor(Shuffles[i]->getShuffleMask(), Factor,
----------------
mzolotukhin wrote:
> Is there a need to redeclare `Index` here?
Removed the re-declaration.

================
Comment at: lib/Target/AArch64/AArch64InterleavedAccess.cpp:279
@@ +278,3 @@
+    ShuffleVectorInst *SV = Shuffles[i];
+    unsigned Index = Indices[i];
+
----------------
mzolotukhin wrote:
> One more redeclaration of `Index`.
Removed re-declaration.

================
Comment at: lib/Target/AArch64/AArch64InterleavedAccess.cpp:351
@@ +350,3 @@
+
+  SmallVector<Value *, 5> Ops;
+
----------------
mzolotukhin wrote:
> Why `5`?
We have at most 5 operands for calling ldN intrinsics (The most one: The ld4 call needs 4 vectors and 1 pointer as arguments).

================
Comment at: lib/Target/AArch64/AArch64TargetTransformInfo.cpp:415
@@ +414,3 @@
+
+  if (Factor > 1 && Factor < 5 && isTypeLegal(VecTy))
+    return Factor;
----------------
sbaranga wrote:
> sbaranga wrote:
> > If we get a a legal type here that cannot be matched by the pass (for example v4i8) this will produce the wrong cost.
> > I think we also need to check that the type size is either 64 or 128.
> > 
> I've looked into this some more, and it seems that the issue is that the size of VecTy is the size of the entire interleaved group. So I think what needs to change here is the check to isTypeLegal to something like:
> 
> isTypeLegal(VectorType::get(VecTy->getScalarType(), VecTy->getVectorNumElements()/Factor));
Silviu, you are right. Fixed in the new patch.

http://reviews.llvm.org/D10335

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/