[PATCH] D121788: [AArch64] Increase MaxInterleaveFactor to 4

Wed Mar 16 03:58:29 PDT 2022

jaykang10 created this revision.
jaykang10 added reviewers: dmgreen, sdesmalen, paulwalker-arm, fhahn, efriedma.
Herald added subscribers: hiraditya, kristof.beyls.
Herald added a project: All.
jaykang10 requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

I have seen cases in which the `MaxInterleaveFactor 4` makes better performance against `MaxInterleaveFactor 2`.
Let's see a simple example.

  void test(char *dstPtr, const char *srcPtr, char *dstEnd) {
    do { 
      memcpy(dstPtr, srcPtr, 8);  
      dstPtr += 8;
      srcPtr += 8;
    } while (dstPtr < dstEnd);
  }

InstCombine pass converts the memcpy into load and store because the length is 8.
The vecotrized assembly output from MaxInterleaveFactor 2 and 4 are as below.

  MaxInterleaveFactor 2
  .LBB0_7:                                // %vector.body
                                          // =>This Inner Loop Header: Depth=1
          ldp     q0, q1, [x13, #-16]
          add     x13, x13, #32  
          subs    x14, x14, #4
          stp     q0, q1, [x12, #-16]
          add     x12, x12, #32  
          b.ne    .LBB0_7

  MaxInterleaveFactor 4
  .LBB0_7:                                // %vector.body
                                          // =>This Inner Loop Header: Depth=1
          ldp     q0, q1, [x12, #-32]
          subs    x14, x14, #8
          ldp     q2, q3, [x12], #64  
          stp     q0, q1, [x13, #-32]
          stp     q2, q3, [x13], #64  
          b.ne    .LBB0_7

Given the number of instructions, the output of `MaxInterleaveFactor 4` could handle 2 times more data ideally than `MaxInterleaveFactor 2` one per iteration.

https://reviews.llvm.org/D121788

Files:
  llvm/lib/Target/AArch64/AArch64Subtarget.h
  llvm/test/Transforms/LoopVectorize/AArch64/intrinsiccost.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-inloop-reductions.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-gather-scatter.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-illegal-type.ll
  llvm/test/Transforms/PhaseOrdering/AArch64/hoisting-sinking-required-for-vectorization.ll
  llvm/test/Transforms/PhaseOrdering/AArch64/peel-multiple-unreachable-exits-for-vectorization.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D121788.415772.patch
Type: text/x-patch
Size: 76267 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220316/5368ee59/attachment-0001.bin>