<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/82781>82781</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Add runtime-check when tail folding when using fixed-width VF
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
SamTebbs33
</td>
</tr>
</table>
<pre>
At the moment, the LoopVectorizer seems to emit broken code when using a fixed-width VF and tail folding enabled.
Reproducer:
```
void foo(char * __restrict__ dst, char * src, unsigned long N) {
#pragma clang loop vectorize_width(16)
for (unsigned long i=0; i<N; ++i)
dst[i] = src[i] + 42;
}
```
Compile with `clang -O3 -march=armv9-a -mllvm -sve-tail-folding=all`
This results in
```
define dso_local void @_Z3fooPcS_m(ptr noalias nocapture noundef writeonly %0, ptr nocapture noundef readonly %1, i64 noundef %2) local_unnamed_addr #0 {
%4 = icmp eq i64 %2, 0
br i1 %4, label %17, label %5
5: ; preds = %3
%6 = tail call <16 x i1> @llvm.get.active.lane.mask.v16i1.i64(i64 0, i64 %2)
br label %7
7: ; preds = %7, %5
%8 = phi i64 [ 0, %5 ], [ %14, %7 ]
%9 = phi <16 x i1> [ %6, %5 ], [ %15, %7 ]
%10 = getelementptr inbounds i8, ptr %1, i64 %8
%11 = tail call <16 x i8> @llvm.masked.load.v16i8.p0(ptr %10, i32 1, <16 x i1> %9, <16 x i8> poison), !tbaa !7
%12 = add <16 x i8> %11, <i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 42>
%13 = getelementptr inbounds i8, ptr %0, i64 %8
tail call void @llvm.masked.store.v16i8.p0(<16 x i8> %12, ptr %13, i32 1, <16 x i1> %9), !tbaa !7
%14 = add i64 %8, 16
%15 = tail call <16 x i1> @llvm.get.active.lane.mask.v16i1.i64(i64 %14, i64 %2)
%16 = extractelement <16 x i1> %15, i64 0
br i1 %16, label %7, label %17, !llvm.loop !10
17: ; preds = %7, %3
ret void
}
declare <16 x i1> @llvm.get.active.lane.mask.v16i1.i64(i64, i64) #1
declare <16 x i8> @llvm.masked.load.v16i8.p0(ptr nocapture, i32 immarg, <16 x i1>, <16 x i8>) #2
declare void @llvm.masked.store.v16i8.p0(<16 x i8>, ptr nocapture, i32 immarg, <16 x i1>) #3
attributes #0 = { mustprogress nofree norecurse nosync nounwind memory(argmem: readwrite) uwtable vscale_range(1,16) "frame-pointer"="non-leaf" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+crc,+dotprod,+fp-armv8,+fullfp16,+lse,+neon,+outline-atomics,+ras,+rcpc,+rdm,+sve,+sve2,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8.5a,+v8a,+v9a,-fmv" }
attributes #1 = { nocallback nofree nosync nounwind willreturn memory(none) }
attributes #2 = { nocallback nofree nosync nounwind willreturn memory(argmem: read) }
attributes #3 = { nocallback nofree nosync nounwind willreturn memory(argmem: write) }
```
Here, %14 = add i64 %8, 16 may overflow, which leads to the wrong result for the call to `@set.active.lane.mask` that follows.
These checks for scalable vectors can be found in `InnerLoopVectorizer::emitIterationCountCheck`. In this example, we'll need to check if `%14` is greater than or equal to the number of iterations and exit the loop early if so, before the `@set.active.lane.mask` call.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzMN0tv6zbWv4beHMiQKMuWF17EdoPvAh86g2nRxWwMijqSOaFIlaTspL9-cCj5leSmLdrFXFzEh-LheT-F96o1iBtWbFmxn4khHK3b_CS6n7GqfJ7PKlu_bZ4ChCNCZzs0gfFdPP2_tf0vKIN16jd04BE7D8ECdipA5ewLGpC2Rjgf0cDglWlBQKNesU7Oqg5H-OUZhKkhCKWhsbomDDSi0ljPWbpn6dP491_YO1sPEh3Ln-4v2DKd_sfjyaoaGmsZL-VROGD8CQ4Hhz44JcPhALWP0l8vvZN0Hky0Qg3amhbgR8bXwFbbkSgAAON570TbCZBamBa0tT2cLrofojaMl9mS8fXtVWOJS_lIXbF8n7J8S8DuRwIY3zK-VbenJGWxVazYA8v3UcjLkW9hwVk-icZW-0_tMP7d2a5XGuGswhHYMh1FT_6RQ9IJJ48s3wvXndaJgKTT-tRB4k-YkDeSyRuEovU7uj8flQeHftDBgzKfSlBjowxC7e1BWyk0RNewRXr4d95Y-0_506FjvOyDA2OFVsKDsVL0YXAIxg6mxgbOTgW0Rr8B40VKjhrx3-M5FPUFLSM0tVxcLxkvOPkzinEYjBEd1gdR1-SbPL33M-PFIlpcya4H_DXSGd_vIL1gVQ5UFnHpsxYV6sh49XAs7i1WsDwGxJ_5R5HRO6x9lIjxIr8Tcxk_xryRQmtg-S5bwiuojOU_kJnJnfMWw1zIoE4418LgvBP-ZX7Kliqbq-WC8ZL0Sy8Gmwx1p-ZVmdW9MitS5oN0Ufub3lHKMl72RzXSL7YjM8ICVuwjXGyj8RbTxSpe3EisryTeqTi-W36PXvEdelkaCbYYUCMVM4ooZSoKFg-qvATZfSSRJvcksu8Zv7w3Phkb67m2oo42L-d9OkV8lCNSzzlENu-U48X64Wsk3FvlrSEPRdWyUAlBv6t72XiUTdT1e6lI7ommKqmGEPv_ASD_4V78_A97J_3MOzefXMrNvSt8sA7vffHRRPze__nvuuhLVyyurriKyXeQLSecGKV_Xxpfk-iTVKa7sWLga3BCXqz7Uacxb8ay8L7cZcuHArf6pPoxnkWJY3tkPMse-kb2deW41jeHIfrvfY-bGovUwuFfMNWkYezxPM--JP5HM_raky4ho7pOuPZD3HzI6kkK_pkUfz6GP7TI3xUncs_vuYsQnKqGgH7qj-Sl1Ra6wYfe2dahp17dOKQG7FAOzhPk34yMTfesTA0ddta9MV4K13bYkeOpS8eOTlyHc6AxD05eCo0HJ0yLNEAxvotDFDDOGyc6THqrTEDHOGf5nnFurEk0ioZxDvGYBCf6Xpk26QRNYRNecANOOD4I-ZL0zoY4sSXV0DToEq9-wyt6OeEG4VoMieyH61WLBp2SjwgNCjKwv2IxvpVxmmR8W1syVT0emj6hMaucToPWTR9zifGt9jgCBqm0E2SHoJXBRATbKenHj05cANlPPFzdjYA_4RXgI3Qq55m4gvwG5jdwcQOLK3gB1gQkTXeKSl8y8DE0smtoULhpXQn5cguMx3A4K60dhsGZW2QYa3ActD8lz_8i-cfA-4JR_rcxukb313P5_-GYmF81CejEG9gTukbbM306H5U8gkZRxx2L9q-zo31iHMTjrkEfYx8JluZ9tkj9J_WQLVMIR0FPtLZnP3-c7dEjyCPKFx9pUn6OmRpzx4MUBiqEhnoyKEOMvhmD7nEbpD0tf6JN8FtAJ4KyZmcHE3ZEmS3TOXwzEGiRwFfR9Toa5IyMr7QGg1iTDlEMUE1UJva3ZQrKQ-tQBCR9hQHrAH8dhL5YxQxdhQ5sA-rC2cctE1_VuMXG7oTC6Tei7S2xrrCxDuP115Yj-85n9Sav1_lazHCTrdJylS5Wi3x23KRNkS2ybM3zYpFVWJR5JrN1tspkvV7neT5TG57yRcp5ni0zzvm8zJqqypZYybXMZNWwRYqdUHoeq7517Ux5P-Cm5Ksym8V-6-OyzrnBM8RLqj7FfuY29CaphtZT11A--BuVoILGzVNdgxtMUB0mo3Xjbv6wgN9t64-7-mxwenMMoffkXf7M-HOrwnGo5tJ2jD8Tr-mHiu1_UAbGn6OEnvHnqMF_AwAA___z0Lgf">