[llvm] [AArch64][LoopVectorize] Use either fixed-width or scalable VF when tail-folding (PR #67543)

David Sherwood via llvm-commits llvm-commits at lists.llvm.org
Thu Sep 28 08:50:18 PDT 2023


================
@@ -0,0 +1,31 @@
+; RUN: opt -S < %s -passes=loop-vectorize -mtriple aarch64-linux-gnu -mattr=+sve 2>&1 | FileCheck %s
+
+define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val){
+; CHECK-LABEL: define void @clamped_tc_8
+; CHECK: call void @llvm.masked.store.nxv8i8.p0(<vscale x 8 x i8> %19, ptr %20, i32 1, <vscale x 8 x i1> %active.lane.mask)
+entry:
+  %rem = and i32 %n, 63
----------------
david-arm wrote:

I think you can delete the checks and branch here for testing purposes. You can then combine this with for.body.preheader to create a single block that jumps to for.body, since you've now hard-coded the trip count to be 8.

https://github.com/llvm/llvm-project/pull/67543


More information about the llvm-commits mailing list