[PATCH] [LoopVectorize]Teach Loop Vectorizer about interleaved data accesses

Fri Apr 3 11:03:23 PDT 2015

On Fri, Apr 3, 2015 at 3:42 AM, Hao Liu <Hao.Liu at arm.com> wrote:
> Hi aschwaighofer, hfinkel, delena, rengolin, t.p.northover,
>
> Hi,
>
> Two weeks ago, I posted a rough patch for RFC review titled with "[RFC][PATCH][LoopVectorize] Teach Loop Vectorizer about interleaved data accesses". I received many comments. Thanks a lot for all your help!
>
> According to the comments, I've refactored my patch, mainly about how to transform several interleave accesses to the vectorized version. The solution is to use two new intrinsics: index.load & index.store.
>
> The attached patch mainly achieves:
>     (1) Teach LoopVectorizer to identify interleave accesses in the Legality phase.
>     (2) Teach LoopVectorizer to transform interleave accesses to index.load/index.store with specific interleaved indices.

Very nice!

I'm curious, how does this relate (if at all) to loops like:

  for (size_t i = 0; i < n; i+=2L) {
     sum += (a[i  ] + b[i  ]);
     sum += (a[i+1] + b[i+1]);
  }

or even:

  for (size_t i = 0; i < n; i+=2L) {
     sum += (a[i  ] + b[i  ]) * (a[i+1] + b[i+1]);
  }

which we currently do a terrible job at (the loop reroller can't help
for the second one).

>     (3) Add a new simple pass in the AArch64 backend. The pass can match the specific index.load/index.store intrinsics to the ldN/stN intrinsics, so that AArch64 backend can generate ldN/stN instructions.

Why not do this in the SelectionDAG?

>     (4) Add two new intrinsics: index.load, index.store.

This is a pretty big deal, no?  The title is unassuming, should this
go into a dedicated thread?  I recall people wondering if there's a
way to combine these new load/store intrinsics, and I can't shake that
feeling.  I don't really have a proposal though ;)

Also, once there's consensus (which there seems to be), I guess these
would need legalization?

-Ahmed

>     (5) Teach the LoopAccessAnalysis to check the memory dependence between strided accesses.
>
> For the correctness, I've tested the patch with LNT, SPEC2000, SPEC2006, EEMBC, GEEKBench on AArch64 target.
>
> For the performance, some specific benchmarks like EEMBC.rgbcmy and EEMBC.rgbyiq are expected to have huge improvements (about 6x and 3x). But two other issues prevent the loop vectorization opportunities:
>     Too many runtime memory checks
>     Type promote issue. i8 is promoted to i32, which introduce additional ZEXT and TRUNC (i8 is illegal in AArch64 but <8xi8> and <16xi8> are legal).
> Anyway, these issues should be solved in the future.
>
> Ask for code review.
>
> Thanks,
> -Hao
>
> http://reviews.llvm.org/D8820
>
> Files:
>   include/llvm/Analysis/TargetTransformInfo.h
>   include/llvm/Analysis/TargetTransformInfoImpl.h
>   include/llvm/IR/IRBuilder.h
>   include/llvm/IR/Intrinsics.td
>   lib/Analysis/LoopAccessAnalysis.cpp
>   lib/Analysis/TargetTransformInfo.cpp
>   lib/IR/IRBuilder.cpp
>   lib/Target/AArch64/AArch64.h
>   lib/Target/AArch64/AArch64InterleaveAccess.cpp
>   lib/Target/AArch64/AArch64TargetMachine.cpp
>   lib/Target/AArch64/AArch64TargetTransformInfo.cpp
>   lib/Target/AArch64/AArch64TargetTransformInfo.h
>   lib/Target/AArch64/CMakeLists.txt
>   lib/Transforms/Vectorize/LoopVectorize.cpp
>   test/CodeGen/AArch64/interleaved-access-to-ldN-stN.ll
>   test/Transforms/LoopVectorize/AArch64/arbitrary-induction-step.ll
>   test/Transforms/LoopVectorize/AArch64/interleaved-access.ll
>
> EMAIL PREFERENCES
>   http://reviews.llvm.org/settings/panel/emailpreferences/
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>