[PATCH] [AArch64] Match interleaved memory accesses into ldN/stN instructions.
Hao Liu
Hao.Liu at arm.com
Wed Jun 10 23:46:43 PDT 2015
Hi James,
Thanks for your comments. I've updated a new patch.
About your concern, the ARM32 patch is similar to this patch. I agree with you that we'd better share the code. But I couldn't find a place for sharing such backend code.
Thanks,
-Hao
================
Comment at: lib/Target/AArch64/AArch64InterleavedAccess.cpp:60
@@ +59,3 @@
+/// This enum is just used to hold the minimum and maximum interleave factor.
+enum {
+ MIN_FACTOR = 2, /// Minimum interleave factor
----------------
jmolloy wrote:
> This would probably be easier simply as two const ints. Enum is overkill.
Reasonable.
================
Comment at: lib/Target/AArch64/AArch64InterleavedAccess.cpp:161
@@ +160,3 @@
+ ShuffleVectorInst *SVI, SmallVector<Instruction *, 16> &DeadInsts) {
+ LoadInst *LI = dyn_cast<LoadInst>(SVI->getOperand(0));
+ if (!LI || !LI->isSimple() || !isa<UndefValue>(SVI->getOperand(1)))
----------------
jmolloy wrote:
> I would expect here you to check whether SVI is in DeadInsts, as we may have identified it earlier. If you don't, I'd expect we'd create multiple LDN instructions unnecessarily!
Reasonable.
================
Comment at: lib/Target/AArch64/AArch64InterleavedAccess.cpp:172
@@ +171,3 @@
+
+ // Check if the mask is strided and get the start index.
+ unsigned Factor, Index;
----------------
jmolloy wrote:
> The order of these checks doesn't seem the most efficient to me. You have a load that has at least one ShuffleVector using it. You're checking the ShuffleVector is a valid mask first, then checking all other Load users.
>
> I'd expect the code would be simpler if:
> 1. Check the illegal vector type size.
> 2. Check the load only had ShuffleVector users.
> 3. Check each ShuffleVector is a strided index.
>
> I guess what I think is strange is that this function takes a ShuffleVector as its argument, rather than a LoadInst.
I've updated the code.
The order is different, I firstly check 2, and then check 1, and 3 (They are all about one shufflevector).
Using a ShuffleVector as argument is to make this pass more efficient. I just think a program should have much more loads/stores than shufflevectors.
================
Comment at: lib/Target/AArch64/AArch64InterleavedAccess.cpp:236
@@ +235,3 @@
+ // Avoid analyzing it twice.
+ if (SV != SVI)
+ SV->eraseFromParent();
----------------
jmolloy wrote:
> This is weird - surely you should just put it into DeadInsts?
Done.
================
Comment at: lib/Target/AArch64/AArch64InterleavedAccess.cpp:250
@@ +249,3 @@
+/// I.e. <0, NumSubElts, ... , NumSubElts*(Factor - 1), 1, NumSubElts + 1, ...>
+static bool IsInterleavedMaskOfGivenFactor(ArrayRef<int> Mask,
+ unsigned Factor) {
----------------
jmolloy wrote:
> Please move this static function up to where all the other static functions are.
>
> I got confused initially how this function was different to isStridedMask above. The difference is that this function checks for a mask that RE-interleaves, whereas isStridedMask checks for a mask that DE-interleaves. Please make that more clear in the function name / comments.
Renamed and updated the comments.
================
Comment at: lib/Target/AArch64/AArch64TargetTransformInfo.cpp:416
@@ +415,3 @@
+ if (Factor > 1 && Factor < 5 && isTypeLegal(VecTy))
+ return Factor;
+
----------------
jmolloy wrote:
> Why does a ld2 have a different cost to a ld3 here?
This is just a rough cost.
The A57 optimisation guide says ldN instructions with different N and types have different cost, ranging from 6 to 11. But generally larger N means higher cost, so I use the factor as the cost.
I think it needs tuning in the future, but now it can work well. The following cost for N > 4 also needs tuning.
http://reviews.llvm.org/D10335
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
More information about the llvm-commits
mailing list