[llvm] Redesign Straight-Line Strength Reduction (SLSR) (PR #162930)
Adrian Kuegel via llvm-commits
llvm-commits at lists.llvm.org
Tue Nov 25 00:23:15 PST 2025
akuegel wrote:
@fiigii I have a reproducer where this change seems to introduce wrong behavior:
```
target datalayout = "e-p6:32:32-i64:64-i128:128-i256:256-v16:16-v32:32-n16:32:64"
target triple = "nvptx64-nvidia-cuda"
; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite)
define ptx_kernel void @loop_subtract_fusion(ptr noalias readonly align 16 captures(none) dereferenceable(12) %0, ptr noalias writeonly align 256 captures(none) dereferenceable(15) %1, i32 %row, i32 %column) {
%3 = addrspacecast ptr %0 to ptr addrspace(1)
%4 = addrspacecast ptr %1 to ptr addrspace(1)
%8 = icmp samesign ult i32 %column, 4
%9 = shl nuw nsw i32 %row, 2
%10 = or disjoint i32 %9, %column
%11 = zext nneg i32 %10 to i64
%12 = getelementptr inbounds i8, ptr addrspace(1) %3, i64 %11
br i1 %8, label %13, label %15
13: ; preds = %2
%14 = load i8, ptr addrspace(1) %12, align 1
br label %15
15: ; preds = %13, %2
%16 = phi i8 [ %14, %13 ], [ 1, %2 ]
%.not = icmp eq i32 %column, 0
%17 = shl nuw nsw i32 %row, 2
%18 = add nuw nsw i32 %17, %column
%19 = zext nneg i32 %18 to i64
%20 = getelementptr i8, ptr addrspace(1) %3, i64 %19
%21 = getelementptr i8, ptr addrspace(1) %20, i64 -1
br i1 %.not, label %24, label %22
22: ; preds = %15
%23 = load i8, ptr addrspace(1) %21, align 1
br label %24
24: ; preds = %22, %15
ret void
}
```
When you run the opt tool on this IR with `-passes=slsr -S` arguments, you can see that it will reuse the GEP `%12` for `%21`. It does this under the assumption that `%10` and `%18` are equivalent. However that ignores the control flow, and in fact the unoptimized LLVM IR has add instead of disjoint or, and InstCombine pass turns it into disjoint or because the usage of the result is guarded in a way that makes this transformation correct. However by reusing the GEP that is based on disjoint or calculation in a different block, it becomes incorrect. Control flow has to be taken into account when making use of disjoint or.
https://github.com/llvm/llvm-project/pull/162930
More information about the llvm-commits
mailing list