[PATCH] D144116: [DAGCombiner] Avoid converting (x or/xor const) + y to (x + y) + const if benefit is unclear

Mon Feb 20 11:21:31 PST 2023

aqjune added a comment.

Hi,

In D144116#4138209 <https://reviews.llvm.org/D144116#4138209>, @dmgreen wrote:

> Would it be possible to optimize the ADDCARRY to the same result as without this fold? Similar to combineADDCARRYDiamond. I looked at the DAG that was being produced, but it's not obvious to me how it would be sensible combined to the same result as before.
>
> I added the fold really to handle cases like this, which can often come up after lowering geps:
>
>   or x1, x1, #1
>   add x1, x1, x2
>   ldr x0, [x1]
>
> Which can be transformed into
>
>   add x1, x1, x2
>   ldr x0, [x1, #1]
>
> If the add+add is reassociated, it makes sense for the add+add-like-or to be reassociated. I have no objections to limiting the fold if we need to though.

I think the example still optimizes with my patch because the addition is a legal op, if I understand correctly.
I wrote a sample LLVM IR function that seems to be analogous to the example above:

  define void @f(i64 %ofs, ptr %p) {
    %ofs_2 = shl i64 %ofs, 1   ; Make %ofs_2 have LSB set to zero; this makes the `or i64 .., 1` below is equivalent to `add i64 .., 1`.
    %ofs_3 = or i64 %ofs_2, 1
    %p2 = getelementptr i8, ptr %p, i64 %ofs_3
    store i64 10, ptr %p2
    ret void
  }

Running `llc --mtriple=arm64-unknown-unknown -mcpu=neoverse-n1 -O3 b.ll -o -` results in:

  add     x8, x1, x0, lsl #1
  mov     w9, #10
  stur    x9, [x8, #1]

Which correctly puts `#1` inside the store instruction's address operand.
Would allowing this fold when the addition is a legal op be sufficient, which is currently this patch is doing?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144116/new/

https://reviews.llvm.org/D144116