[PATCH] D143723: [RISCV] Increase default vectorizer LMUL to 2

Fri Feb 10 03:32:36 PST 2023

luke created this revision.
luke added reviewers: reames, craig.topper, asb.
Herald added subscribers: pmatos, VincentWu, vkmr, frasercrmck, evandro, luismarques, apazos, sameer.abuasal, s.egerton, Jim, benna, psnobl, jocewei, PkmX, the_o, brucehoult, MartinMosbeck, rogfer01, edward-jones, zzheng, jrtc27, shiva0217, kito-cheng, niosHD, sabuasal, simoncook, johnrusso, rbar, hiraditya, arichardson.
Herald added a project: All.
luke requested review of this revision.
Herald added subscribers: llvm-commits, pcwang-thead, eopXD, MaskRay.
Herald added a project: LLVM.

After some discussion and experimentation, we have seen that changing
the default number of vector register bits to LMUL=2 has struck a sweet
spot.
Whilst we could be clever here and make the vectorizer smarter about
dynamically selecting an LMUL that
a) Doesn't affect register pressure
b) Suitable for the microarchitecture
we would need to teach its heuristics about RISC-V register grouping
specifics.
Instead this just does the easy, pragmatic thing by changing the default
to a safe value that doesn't affect register pressure signifcantly[1],
but should increase throughput and unlock more interleaving.

[1] Register spilling when compiling sqlite at various levels of
-riscv-v-register-bit-width-lmul:

LMUL=1    2573 spills
LMUL=2    2583 spills
LMUL=4    2819 spills
LMUL=8    3256 spills


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D143723

Files:
  llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
  llvm/test/Transforms/LoopVectorize/RISCV/lmul.ll


Index: llvm/test/Transforms/LoopVectorize/RISCV/lmul.ll
===================================================================

--- llvm/test/Transforms/LoopVectorize/RISCV/lmul.ll
+++ llvm/test/Transforms/LoopVectorize/RISCV/lmul.ll
@@ -9,25 +9,43 @@
 ; DEFAULT-LABEL: @load_store(
 ; DEFAULT-NEXT:  entry:
 ; DEFAULT-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
-; DEFAULT-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP0]]
+; DEFAULT-NEXT:    [[TMP1:%.*]] = mul i64 [[TMP0]], 4
+; DEFAULT-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP1]]
 ; DEFAULT-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; DEFAULT:       vector.ph:
-; DEFAULT-NEXT:    [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; DEFAULT-NEXT:    [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP1]]
+; DEFAULT-NEXT:    [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
+; DEFAULT-NEXT:    [[TMP3:%.*]] = mul i64 [[TMP2]], 4
+; DEFAULT-NEXT:    [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP3]]
 ; DEFAULT-NEXT:    [[N_VEC:%.*]] = sub i64 1024, [[N_MOD_VF]]
 ; DEFAULT-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; DEFAULT:       vector.body:
 ; DEFAULT-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
-; DEFAULT-NEXT:    [[TMP2:%.*]] = add i64 [[INDEX]], 0
-; DEFAULT-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i64, ptr [[P:%.*]], i64 [[TMP2]]
-; DEFAULT-NEXT:    [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[TMP3]], i32 0
-; DEFAULT-NEXT:    [[WIDE_LOAD:%.*]] = load <vscale x 1 x i64>, ptr [[TMP4]], align 4
-; DEFAULT-NEXT:    [[TMP5:%.*]] = add <vscale x 1 x i64> [[WIDE_LOAD]], shufflevector (<vscale x 1 x i64> insertelement (<vscale x 1 x i64> poison, i64 1, i64 0), <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer)
-; DEFAULT-NEXT:    store <vscale x 1 x i64> [[TMP5]], ptr [[TMP4]], align 4
-; DEFAULT-NEXT:    [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
-; DEFAULT-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP6]]
-; DEFAULT-NEXT:    [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; DEFAULT-NEXT:    br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; DEFAULT-NEXT:    [[TMP4:%.*]] = add i64 [[INDEX]], 0
+; DEFAULT-NEXT:    [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
+; DEFAULT-NEXT:    [[TMP6:%.*]] = mul i64 [[TMP5]], 2
+; DEFAULT-NEXT:    [[TMP7:%.*]] = add i64 [[TMP6]], 0
+; DEFAULT-NEXT:    [[TMP8:%.*]] = mul i64 [[TMP7]], 1
+; DEFAULT-NEXT:    [[TMP9:%.*]] = add i64 [[INDEX]], [[TMP8]]
+; DEFAULT-NEXT:    [[TMP10:%.*]] = getelementptr inbounds i64, ptr [[P:%.*]], i64 [[TMP4]]
+; DEFAULT-NEXT:    [[TMP11:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 [[TMP9]]
+; DEFAULT-NEXT:    [[TMP12:%.*]] = getelementptr inbounds i64, ptr [[TMP10]], i32 0
+; DEFAULT-NEXT:    [[WIDE_LOAD:%.*]] = load <vscale x 2 x i64>, ptr [[TMP12]], align 4
+; DEFAULT-NEXT:    [[TMP13:%.*]] = call i64 @llvm.vscale.i64()
+; DEFAULT-NEXT:    [[TMP14:%.*]] = mul i64 [[TMP13]], 2
+; DEFAULT-NEXT:    [[TMP15:%.*]] = getelementptr inbounds i64, ptr [[TMP10]], i64 [[TMP14]]
+; DEFAULT-NEXT:    [[WIDE_LOAD1:%.*]] = load <vscale x 2 x i64>, ptr [[TMP15]], align 4
+; DEFAULT-NEXT:    [[TMP16:%.*]] = add <vscale x 2 x i64> [[WIDE_LOAD]], shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i64 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
+; DEFAULT-NEXT:    [[TMP17:%.*]] = add <vscale x 2 x i64> [[WIDE_LOAD1]], shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i64 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
+; DEFAULT-NEXT:    store <vscale x 2 x i64> [[TMP16]], ptr [[TMP12]], align 4
+; DEFAULT-NEXT:    [[TMP18:%.*]] = call i64 @llvm.vscale.i64()
+; DEFAULT-NEXT:    [[TMP19:%.*]] = mul i64 [[TMP18]], 2
+; DEFAULT-NEXT:    [[TMP20:%.*]] = getelementptr inbounds i64, ptr [[TMP10]], i64 [[TMP19]]
+; DEFAULT-NEXT:    store <vscale x 2 x i64> [[TMP17]], ptr [[TMP20]], align 4
+; DEFAULT-NEXT:    [[TMP21:%.*]] = call i64 @llvm.vscale.i64()
+; DEFAULT-NEXT:    [[TMP22:%.*]] = mul i64 [[TMP21]], 4
+; DEFAULT-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP22]]
+; DEFAULT-NEXT:    [[TMP23:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; DEFAULT-NEXT:    br i1 [[TMP23]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; DEFAULT:       middle.block:
 ; DEFAULT-NEXT:    [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]
 ; DEFAULT-NEXT:    br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
Index: llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
===================================================================
--- llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -23,7 +23,7 @@
     cl::desc(
         "The LMUL to use for getRegisterBitWidth queries. Affects LMUL used "
         "by autovectorized code. Fractional LMULs are not supported."),
-    cl::init(1), cl::Hidden);
+    cl::init(2), cl::Hidden);
 
 static cl::opt<unsigned> SLPMaxVF(
     "riscv-v-slp-max-vf",


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D143723.496404.patch
Type: text/x-patch
Size: 5131 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230210/faca9f6b/attachment-0001.bin>