[clang] [clang] add array out-of-bounds access constraints using llvm.assume (PR #159046)
Sjoerd Meijer via cfe-commits
cfe-commits at lists.llvm.org
Wed Oct 1 07:53:15 PDT 2025
sjoerdmeijer wrote:
I had a little play again with this patch, the updated one. The short summary is:
- I am a little concerned how intrusive this is, i.e. I'm concerned about the impact on compile time and performance. For my little example, the number of IR instructions in the vector body is about twice bigger, but the final codegen for the vector body is the same though, which is a good thing and an improvement. But there are some codegen changes in the scalar loop. So my prediction is that it is not going to be compile-time friendly, and second, we might see all sorts of performance corner cases, but only numbers will tell I guess...
- Maybe this is getting ahead of things (i.e. numbers), but maybe we can have a little think whether we can be more selective with emitting this intrinsics.
Here's the longer story, the code examples I played with.
Small extension of the example in the description:
```
int arr[10];
int test_simple_array(int i, int n, int * __restrict A, int * __restrict B) {
for (int i = 0; i< n; ++i)
arr[i] += A[i] * B[i];
return arr[i];
}
```
The original vector body before is:
```
11: ; preds = %11, %9
%12 = phi i64 [ 0, %9 ], [ %29, %11 ]
%13 = getelementptr inbounds nuw i32, ptr %2, i64 %12
%14 = getelementptr inbounds nuw i8, ptr %13, i64 16
%15 = load <4 x i32>, ptr %13, align 4, !tbaa !6
%16 = load <4 x i32>, ptr %14, align 4, !tbaa !6
%17 = getelementptr inbounds nuw i32, ptr %3, i64 %12
%18 = getelementptr inbounds nuw i8, ptr %17, i64 16
%19 = load <4 x i32>, ptr %17, align 4, !tbaa !6
%20 = load <4 x i32>, ptr %18, align 4, !tbaa !6
%21 = mul nsw <4 x i32> %19, %15
%22 = mul nsw <4 x i32> %20, %16
%23 = getelementptr inbounds nuw i32, ptr @arr, i64 %12
%24 = getelementptr inbounds nuw i8, ptr %23, i64 16
%25 = load <4 x i32>, ptr %23, align 4, !tbaa !6
%26 = load <4 x i32>, ptr %24, align 4, !tbaa !6
%27 = add nsw <4 x i32> %25, %21
%28 = add nsw <4 x i32> %26, %22
store <4 x i32> %27, ptr %23, align 4, !tbaa !6
store <4 x i32> %28, ptr %24, align 4, !tbaa !6
%29 = add nuw i64 %12, 8
%30 = icmp eq i64 %29, %10
br i1 %30, label %31, label %11, !llvm.loop !10
```
And after with this patch:
```
11: ; preds = %11, %9
%12 = phi i64 [ 0, %9 ], [ %41, %11 ]
%13 = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, %9 ], [ %42, %11 ]
%14 = add <4 x i64> %13, splat (i64 4)
%15 = getelementptr inbounds nuw i32, ptr %2, i64 %12
%16 = getelementptr inbounds nuw i8, ptr %15, i64 16
%17 = load <4 x i32>, ptr %15, align 4, !tbaa !6
%18 = load <4 x i32>, ptr %16, align 4, !tbaa !6
%19 = getelementptr inbounds nuw i32, ptr %3, i64 %12
%20 = getelementptr inbounds nuw i8, ptr %19, i64 16
%21 = load <4 x i32>, ptr %19, align 4, !tbaa !6
%22 = load <4 x i32>, ptr %20, align 4, !tbaa !6
%23 = mul nsw <4 x i32> %21, %17
%24 = mul nsw <4 x i32> %22, %18
%25 = icmp ult <4 x i64> %13, splat (i64 10)
%26 = icmp ult <4 x i64> %14, splat (i64 10)
%27 = extractelement <4 x i1> %25, i64 0
tail call void @llvm.assume(i1 %27)
%28 = extractelement <4 x i1> %25, i64 1
tail call void @llvm.assume(i1 %28)
%29 = extractelement <4 x i1> %25, i64 2
tail call void @llvm.assume(i1 %29)
%30 = extractelement <4 x i1> %25, i64 3
tail call void @llvm.assume(i1 %30)
%31 = extractelement <4 x i1> %26, i64 0
tail call void @llvm.assume(i1 %31)
%32 = extractelement <4 x i1> %26, i64 1
tail call void @llvm.assume(i1 %32)
%33 = extractelement <4 x i1> %26, i64 2
tail call void @llvm.assume(i1 %33)
%34 = extractelement <4 x i1> %26, i64 3
tail call void @llvm.assume(i1 %34)
%35 = getelementptr inbounds nuw i32, ptr @arr, i64 %12
%36 = getelementptr inbounds nuw i8, ptr %35, i64 16
%37 = load <4 x i32>, ptr %35, align 4, !tbaa !6
%38 = load <4 x i32>, ptr %36, align 4, !tbaa !6
%39 = add nsw <4 x i32> %37, %23
%40 = add nsw <4 x i32> %38, %24
store <4 x i32> %39, ptr %35, align 4, !tbaa !6
store <4 x i32> %40, ptr %36, align 4, !tbaa !6
%41 = add nuw i64 %12, 8
%42 = add <4 x i64> %13, splat (i64 8)
%43 = icmp eq i64 %41, %10
br i1 %43, label %44, label %11, !llvm.loop !10
```
As I mentioned, the good thing is that this gets optimised away, and the final codegen is the same, but it is quite an expansion.
The scalar loop before is:
```
.LBB0_7: // =>This Inner Loop Header: Depth=1
ldr w10, [x12], #4
ldr w15, [x13]
ldr w14, [x11], #4
subs x9, x9, #1
madd w10, w14, w10, w15
str w10, [x13], #4
b.ne .LBB0_7
```
And after with this patch is:
```
.LBB0_6: // =>This Inner Loop Header: Depth=1
ldr w11, [x2, x10, lsl #2]
ldr w12, [x3, x10, lsl #2]
ldr w13, [x8, x10, lsl #2]
madd w11, w12, w11, w13
str w11, [x8, x10, lsl #2]
add x10, x10, #1
cmp x9, x10
b.ne .LBB0_6
```
It might perform the same, but the only thing I'm saying is that it is different and the new version is one instruction longer because the loop is no longer counting down but counting up.
https://github.com/llvm/llvm-project/pull/159046
More information about the cfe-commits
mailing list