[PATCH] [LoopVectorize]Teach Loop Vectorizer about interleaved data accesses
Ahmed Bougacha
ahmed.bougacha at gmail.com
Fri Apr 3 11:03:23 PDT 2015
On Fri, Apr 3, 2015 at 3:42 AM, Hao Liu <Hao.Liu at arm.com> wrote:
> Hi aschwaighofer, hfinkel, delena, rengolin, t.p.northover,
>
> Hi,
>
> Two weeks ago, I posted a rough patch for RFC review titled with "[RFC][PATCH][LoopVectorize] Teach Loop Vectorizer about interleaved data accesses". I received many comments. Thanks a lot for all your help!
>
> According to the comments, I've refactored my patch, mainly about how to transform several interleave accesses to the vectorized version. The solution is to use two new intrinsics: index.load & index.store.
>
> The attached patch mainly achieves:
> (1) Teach LoopVectorizer to identify interleave accesses in the Legality phase.
> (2) Teach LoopVectorizer to transform interleave accesses to index.load/index.store with specific interleaved indices.
Very nice!
I'm curious, how does this relate (if at all) to loops like:
for (size_t i = 0; i < n; i+=2L) {
sum += (a[i ] + b[i ]);
sum += (a[i+1] + b[i+1]);
}
or even:
for (size_t i = 0; i < n; i+=2L) {
sum += (a[i ] + b[i ]) * (a[i+1] + b[i+1]);
}
which we currently do a terrible job at (the loop reroller can't help
for the second one).
> (3) Add a new simple pass in the AArch64 backend. The pass can match the specific index.load/index.store intrinsics to the ldN/stN intrinsics, so that AArch64 backend can generate ldN/stN instructions.
Why not do this in the SelectionDAG?
> (4) Add two new intrinsics: index.load, index.store.
This is a pretty big deal, no? The title is unassuming, should this
go into a dedicated thread? I recall people wondering if there's a
way to combine these new load/store intrinsics, and I can't shake that
feeling. I don't really have a proposal though ;)
Also, once there's consensus (which there seems to be), I guess these
would need legalization?
-Ahmed
> (5) Teach the LoopAccessAnalysis to check the memory dependence between strided accesses.
>
> For the correctness, I've tested the patch with LNT, SPEC2000, SPEC2006, EEMBC, GEEKBench on AArch64 target.
>
> For the performance, some specific benchmarks like EEMBC.rgbcmy and EEMBC.rgbyiq are expected to have huge improvements (about 6x and 3x). But two other issues prevent the loop vectorization opportunities:
> Too many runtime memory checks
> Type promote issue. i8 is promoted to i32, which introduce additional ZEXT and TRUNC (i8 is illegal in AArch64 but <8xi8> and <16xi8> are legal).
> Anyway, these issues should be solved in the future.
>
> Ask for code review.
>
> Thanks,
> -Hao
>
> http://reviews.llvm.org/D8820
>
> Files:
> include/llvm/Analysis/TargetTransformInfo.h
> include/llvm/Analysis/TargetTransformInfoImpl.h
> include/llvm/IR/IRBuilder.h
> include/llvm/IR/Intrinsics.td
> lib/Analysis/LoopAccessAnalysis.cpp
> lib/Analysis/TargetTransformInfo.cpp
> lib/IR/IRBuilder.cpp
> lib/Target/AArch64/AArch64.h
> lib/Target/AArch64/AArch64InterleaveAccess.cpp
> lib/Target/AArch64/AArch64TargetMachine.cpp
> lib/Target/AArch64/AArch64TargetTransformInfo.cpp
> lib/Target/AArch64/AArch64TargetTransformInfo.h
> lib/Target/AArch64/CMakeLists.txt
> lib/Transforms/Vectorize/LoopVectorize.cpp
> test/CodeGen/AArch64/interleaved-access-to-ldN-stN.ll
> test/Transforms/LoopVectorize/AArch64/arbitrary-induction-step.ll
> test/Transforms/LoopVectorize/AArch64/interleaved-access.ll
>
> EMAIL PREFERENCES
> http://reviews.llvm.org/settings/panel/emailpreferences/
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
More information about the llvm-commits
mailing list