[llvm] [SelectionDAG] Fix and improve TargetLowering::SimplifySetCC (PR #87646)

Thu Apr 4 13:55:24 PDT 2024

================
@@ -189,7 +191,7 @@ define i1 @test_48_16_8(ptr %y) {
 ; CHECK-LE-LABEL: test_48_16_8:
 ; CHECK-LE:       @ %bb.0:
 ; CHECK-LE-NEXT:    ldrh r0, [r0, #1]
-; CHECK-LE-NEXT:    cmp r0, #0
+; CHECK-LE-NEXT:    lsls r0, r0, #8
----------------
bjope wrote:

I don't know if lsls or cmp is better than the other here for ARM?

What happens is that after load narrowing we get:
```
          t17: i32,ch = load<(load (s32) from %ir.y, align 8)> t0, t2, undef:i32
        t19: i32 = and t17, Constant:i32<16776960>
      t21: i1 = setcc t19, Constant:i32<0>, setne:ch
```
and then the DAG combiner triggers on the AND and changes it into
```
            t23: i32 = add nuw t2, Constant:i32<1>
          t24: i32,ch = load<(load (s16) from %ir.y + 1, align 1, basealign 8), zext from i16> t0, t23, undef:i32
        t26: i32 = shl t24, Constant:i32<8>
      t21: i1 = setcc t26, Constant:i32<0>, setne:ch
```

I think the optimization in this patch first avoids introducing a misaligned 16-bit load from `%ir.y + 1` and instead uses a 32-bit load. But then some other DAG combine is narrowing the load a second time, resulting in the unaligned load, but also introducing an SHL that isn't needed for the comparison with 0.

Makes me wonder if the 16 bit load with align 1 a bad thing here? 

It also seems like we lack some optimization that removes the redundant SHL :-(

https://github.com/llvm/llvm-project/pull/87646