[llvm] [ValueTracking] Extend LHS/RHS with matching operand to work without constants. (PR #85557)
via llvm-commits
llvm-commits at lists.llvm.org
Tue Mar 19 12:03:37 PDT 2024
goldsteinn wrote:
> > This patch seems to block SROA: [dtcxzyw/llvm-opt-benchmark#419 (comment)](https://github.com/dtcxzyw/llvm-opt-benchmark/pull/419#discussion_r1527863982).
>
> Seems to boil down to simplifications happening earlier.
>
> A reduced form:
>
> ```
>
> define void @fun0() {
> entry:
> %first111 = alloca [0 x [0 x [0 x ptr]]], i32 0, align 8
> store i64 0, ptr %first111, align 8
> %last = getelementptr i8, ptr %first111, i64 8
> call void @fun3(ptr %first111, ptr %last)
> ret void
> }
>
> define void @fun3(ptr %first, ptr %last, ptr %p_in) {
> entry:
> %sub.ptr.lhs.cast = ptrtoint ptr %last to i64
> %sub.ptr.rhs.cast = ptrtoint ptr %first to i64
> %sub.ptr.sub = sub i64 %sub.ptr.lhs.cast, %sub.ptr.rhs.cast
> %call = ashr exact i64 %sub.ptr.sub, 3
> %call2 = load volatile i64, ptr %p_in, align 8
> %cmp = icmp ugt i64 %call, %call2
> br i1 %cmp, label %common.ret, label %if.else
>
> common.ret: ; preds = %if.else29, %if.else, %entry
> ret void
>
> if.else: ; preds = %entry
> %c_load.cast0.i = ptrtoint ptr %last to i64
> %c_load.cast.div0.i = ashr exact i64 %c_load.cast0.i, 3
> %cmp24.not = icmp ult i64 %c_load.cast.div0.i, %call
> br i1 %cmp24.not, label %if.else29, label %common.ret
>
> if.else29: ; preds = %if.else
> %n_is_c = call i1 @llvm.is.constant.i64(i64 %c_load.cast.div0.i)
> %cmp2 = icmp eq i64 %c_load.cast.div0.i, -1
> %or.cond1 = and i1 %n_is_c, %cmp2
> %add.ptr = getelementptr i64, ptr %first, i64 %c_load.cast.div0.i
> %.pre = ptrtoint ptr %add.ptr to i64
> %ptr.lhs.pre-phi = select i1 %or.cond1, i64 0, i64 %.pre
> %ptr.sub = sub i64 %ptr.lhs.pre-phi, %sub.ptr.rhs.cast
> call void @llvm.memmove.p0.p0.i64(ptr null, ptr %first, i64 %ptr.sub, i1 false)
> br label %common.ret
> }
>
> ; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: readwrite)
> declare void @llvm.memmove.p0.p0.i64(ptr nocapture writeonly, ptr nocapture readonly, i64, i1 immarg) #0
>
> ; Function Attrs: convergent nocallback nofree nosync nounwind willreturn memory(none)
> declare i1 @llvm.is.constant.i64(i64) #1
>
> attributes #0 = { nocallback nofree nounwind willreturn memory(argmem: readwrite) }
> attributes #1 = { convergent nocallback nofree nosync nounwind willreturn memory(none) }
> ```
>
> Where we go awry is when fold:
>
> ```
> %c_load.cast.div0.i = ashr exact i64 %c_load.cast0.i, 3
> %cmp24.not = icmp ult i64 %c_load.cast.div0.i, %call
> ```
>
> ->
>
> ```
> %cmp24.not = icmp ugt i64 %sub.ptr.sub, %c_load.cast0.i
> ```
>
> Which eventually results in the following diff after inlining:
>
> ```
> %c_load.cast.div0.i.i = ashr exact i64 %c_load.cast0.i.i, 3
> %cmp24.not.i = icmp ult i64 %c_load.cast.div0.i.i, 1
> ```
>
> ```
> %cmp24.not.i = icmp ugt i64 8, %c_load.cast0.i.i
> ```
>
> Then finally:
>
> ```
> %cmp24.not.i = icmp eq ptr %c_load0.i.i, null
> ```
>
> vs
>
> ```
> %cmp24.not.i = icmp ult ptr %c_load0.i.i, inttoptr (i64 8 to ptr)
> ```
>
> Essentially we throw away the information that the low 3 bits of the pointer are zero before we have enough information to fully reduce the compare to something easy to analyze.
>
> Looking into a fix...
>
> Edit: Similiar to last time, there is no point where it seems we make a "bad decision", its just that the order we make good decisions varies.
The only really thing I can think of is to preserve the information w/ assumes
when we fold a single use `shr exact`. But that requires the use has
a `noundef` b.c violating an assume is immediate UB:
https://alive2.llvm.org/ce/z/AkRPAs
Don't think thats really a great path to go down although some
method of ensuring we don't throw away information when
folding would be nice.
https://github.com/llvm/llvm-project/pull/85557
More information about the llvm-commits
mailing list