[PATCH] [AArch64] Match interleaved memory accesses into ldN/stN instructions.

Hao Liu Hao.Liu at arm.com
Wed Jun 10 23:46:43 PDT 2015


Hi James,

Thanks for your comments. I've updated a new patch.

About your concern, the ARM32 patch is similar to this patch. I agree with you that we'd better share the code. But I couldn't find a place for sharing such backend code.

Thanks,
-Hao


================
Comment at: lib/Target/AArch64/AArch64InterleavedAccess.cpp:60
@@ +59,3 @@
+/// This enum is just used to hold the minimum and maximum interleave factor.
+enum {
+  MIN_FACTOR = 2, /// Minimum interleave factor
----------------
jmolloy wrote:
> This would probably be easier simply as two const ints. Enum is overkill.
Reasonable.

================
Comment at: lib/Target/AArch64/AArch64InterleavedAccess.cpp:161
@@ +160,3 @@
+    ShuffleVectorInst *SVI, SmallVector<Instruction *, 16> &DeadInsts) {
+  LoadInst *LI = dyn_cast<LoadInst>(SVI->getOperand(0));
+  if (!LI || !LI->isSimple() || !isa<UndefValue>(SVI->getOperand(1)))
----------------
jmolloy wrote:
> I would expect here you to check whether SVI is in DeadInsts, as we may have identified it earlier. If you don't, I'd expect we'd create multiple LDN instructions unnecessarily!
Reasonable.

================
Comment at: lib/Target/AArch64/AArch64InterleavedAccess.cpp:172
@@ +171,3 @@
+
+  // Check if the mask is strided and get the start index.
+  unsigned Factor, Index;
----------------
jmolloy wrote:
> The order of these checks doesn't seem the most efficient to me. You have a load that has at least one ShuffleVector using it. You're checking the ShuffleVector is a valid mask first, then checking all other Load users.
> 
> I'd expect the code would be simpler if:
>   1. Check the illegal vector type size.
>   2. Check the load only had ShuffleVector users.
>   3. Check each ShuffleVector is a strided index.
> 
> I guess what I think is strange is that this function takes a ShuffleVector as its argument, rather than a LoadInst.
I've updated the code.

The order is different, I firstly check 2, and then check 1, and 3 (They are all about one shufflevector).

Using a ShuffleVector as argument is to make this pass more efficient. I just think a program should have much more loads/stores than shufflevectors. 

================
Comment at: lib/Target/AArch64/AArch64InterleavedAccess.cpp:236
@@ +235,3 @@
+    // Avoid analyzing it twice.
+    if (SV != SVI)
+      SV->eraseFromParent();
----------------
jmolloy wrote:
> This is weird - surely you should just put it into DeadInsts?
Done.

================
Comment at: lib/Target/AArch64/AArch64InterleavedAccess.cpp:250
@@ +249,3 @@
+/// I.e. <0, NumSubElts, ... , NumSubElts*(Factor - 1), 1, NumSubElts + 1, ...>
+static bool IsInterleavedMaskOfGivenFactor(ArrayRef<int> Mask,
+                                           unsigned Factor) {
----------------
jmolloy wrote:
> Please move this static function up to where all the other static functions are.
> 
> I got confused initially how this function was different to isStridedMask above. The difference is that this function checks for a mask that RE-interleaves, whereas isStridedMask checks for a mask that DE-interleaves. Please make that more clear in the function name / comments.
Renamed and updated the comments.

================
Comment at: lib/Target/AArch64/AArch64TargetTransformInfo.cpp:416
@@ +415,3 @@
+  if (Factor > 1 && Factor < 5 && isTypeLegal(VecTy))
+    return Factor;
+
----------------
jmolloy wrote:
> Why does a ld2 have a different cost to a ld3 here?
This is just a rough cost. 

The A57 optimisation guide says ldN instructions with different N and types have different cost, ranging from 6 to 11. But generally larger N means higher cost, so I use the factor as the cost.

I think it needs tuning in the future, but now it can work well. The following cost for N > 4 also needs tuning.

http://reviews.llvm.org/D10335

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/






More information about the llvm-commits mailing list