[PATCH] D124869: [WIP][RISCV] Hoist VSETVLI out of idiomatic fixed length vector loops

Tue May 3 12:32:40 PDT 2022

reames created this revision.
reames added reviewers: craig.topper, khchen, Chenbing.Zheng, jacquesguan.
Herald added subscribers: sunshaoce, VincentWu, luke957, StephenFan, vkmr, frasercrmck, evandro, luismarques, apazos, sameer.abuasal, s.egerton, Jim, benna, psnobl, jocewei, PkmX, the_o, brucehoult, MartinMosbeck, rogfer01, edward-jones, zzheng, jrtc27, kito-cheng, niosHD, sabuasal, bollu, simoncook, johnrusso, rbar, asb, hiraditya, arichardson, mcrosier.
Herald added a project: All.
reames requested review of this revision.
Herald added subscribers: pcwang-thead, eopXD, MaskRay.
Herald added a project: LLVM.

This is WIP with a bunch of cleanup/splitting and tests needed; posting early as I'm very new to the riscv targeted and want to make sure this is a reasonable idea.

This patch teaches the VSETVLI insertion pass to perform a very limited form of partial redundancy elimination.  The motivating example comes from the fixed length vectorization of a simple loop such as:

  for (unsigned i = 0; i < a_len; i++)
      a[i] += b;

Without this change, the core vector loop and preheader is as follows:

  .LBB0_3:                                # %vector.ph
  	andi	a1, a6, -8
  	addi	a4, a0, 16
  	mv	a5, a1
  .LBB0_4:                                # %vector.body
                                          # =>This Inner Loop Header: Depth=1
  	addi	a3, a4, -16
  	vsetivli	zero, 4, e32, m1, ta, mu
  	vle32.v	v8, (a3)
  	vle32.v	v9, (a4)
  	vadd.vx	v8, v8, a2
  	vadd.vx	v9, v9, a2
  	vse32.v	v8, (a3)
  	vse32.v	v9, (a4)
  	addi	a5, a5, -8
  	addi	a4, a4, 32
  	bnez	a5, .LBB0_4

The key thing to note here is that, I believe, the execution of the vsetivli only needs to happen once.  Since there's no tail folding happening here, the value of the vector configuration registers are invariant through the loop.

After this patch, we hoist the configuration into the preheader and perform it once.

  .LBB0_3:                                # %vector.ph
  	andi	a1, a6, -8
  	vsetivli	zero, 4, e32, m1, ta, mu
  	addi	a4, a0, 16
  	mv	a5, a1
  .LBB0_4:                                # %vector.body
                                          # =>This Inner Loop Header: Depth=1
  	addi	a3, a4, -16
  	vle32.v	v8, (a3)
  	vle32.v	v9, (a4)
  	vadd.vx	v8, v8, a2
  	vadd.vx	v9, v9, a2
  	vse32.v	v8, (a3)
  	vse32.v	v9, (a4)
  	addi	a5, a5, -8
  	addi	a4, a4, 32
  	bnez	a5, .LBB0_4

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D124869

Files:
  llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
  llvm/test/CodeGen/RISCV/rvv/fixed-vector-strided-load-store.ll
  llvm/test/CodeGen/RISCV/rvv/sink-splat-operands.ll
  llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.ll
  llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D124869.426801.patch
Type: text/x-patch
Size: 35948 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220503/ccce8830/attachment.bin>