[LoopVectorize]Teach Loop Vectorizer about interleaved data accesses

Fri Apr 3 14:41:51 PDT 2015

    (1) Teach LoopVectorizer to identify interleave accesses in the Legality phase. 
Ask LoopVectorizer (Legality->target) whether the specified indices are profitable, give data type and list of constant indices.
Different sequence may be profitable for different targets.

    (2) Teach LoopVectorizer to transform interleave accesses to index.load/index.store with specific interleaved indices.
index.load/store should just receive the indices. Not just interleave.

-  Elena

-----Original Message-----
From: Hao Liu [mailto:Hao.Liu at arm.com] 
Sent: Friday, April 03, 2015 13:43
To: Hao.Liu at arm.com; aschwaighofer at apple.com; hfinkel at anl.gov; Demikhovsky, Elena; renato.golin at linaro.org; t.p.northover at gmail.com
Cc: amara.emerson at arm.com; llvm-commits at cs.uiuc.edu; mcrosier at codeaurora.org
Subject: [PATCH] [LoopVectorize]Teach Loop Vectorizer about interleaved data accesses

Hi aschwaighofer, hfinkel, delena, rengolin, t.p.northover,

Hi,

Two weeks ago, I posted a rough patch for RFC review titled with "[RFC][PATCH][LoopVectorize] Teach Loop Vectorizer about interleaved data accesses". I received many comments. Thanks a lot for all your help!

According to the comments, I've refactored my patch, mainly about how to transform several interleave accesses to the vectorized version. The solution is to use two new intrinsics: index.load & index.store. 

The attached patch mainly achieves:
    (1) Teach LoopVectorizer to identify interleave accesses in the Legality phase. 
    (2) Teach LoopVectorizer to transform interleave accesses to index.load/index.store with specific interleaved indices.
    (3) Add a new simple pass in the AArch64 backend. The pass can match the specific index.load/index.store intrinsics to the ldN/stN intrinsics, so that AArch64 backend can generate ldN/stN instructions.
    (4) Add two new intrinsics: index.load, index.store.
    (5) Teach the LoopAccessAnalysis to check the memory dependence between strided accesses.

For the correctness, I've tested the patch with LNT, SPEC2000, SPEC2006, EEMBC, GEEKBench on AArch64 target.

For the performance, some specific benchmarks like EEMBC.rgbcmy and EEMBC.rgbyiq are expected to have huge improvements (about 6x and 3x). But two other issues prevent the loop vectorization opportunities:
    Too many runtime memory checks
    Type promote issue. i8 is promoted to i32, which introduce additional ZEXT and TRUNC (i8 is illegal in AArch64 but <8xi8> and <16xi8> are legal).
Anyway, these issues should be solved in the future.

Ask for code review.

Thanks,
-Hao

http://reviews.llvm.org/D8820

Files:
  include/llvm/Analysis/TargetTransformInfo.h
  include/llvm/Analysis/TargetTransformInfoImpl.h
  include/llvm/IR/IRBuilder.h
  include/llvm/IR/Intrinsics.td
  lib/Analysis/LoopAccessAnalysis.cpp
  lib/Analysis/TargetTransformInfo.cpp
  lib/IR/IRBuilder.cpp
  lib/Target/AArch64/AArch64.h
  lib/Target/AArch64/AArch64InterleaveAccess.cpp
  lib/Target/AArch64/AArch64TargetMachine.cpp
  lib/Target/AArch64/AArch64TargetTransformInfo.cpp
  lib/Target/AArch64/AArch64TargetTransformInfo.h
  lib/Target/AArch64/CMakeLists.txt
  lib/Transforms/Vectorize/LoopVectorize.cpp
  test/CodeGen/AArch64/interleaved-access-to-ldN-stN.ll
  test/Transforms/LoopVectorize/AArch64/arbitrary-induction-step.ll
  test/Transforms/LoopVectorize/AArch64/interleaved-access.ll

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.