[llvm] [MachineCopyPropagation] Recognise and delete no-op moves produced after forwarded uses (PR #129889)

Fri Apr 25 06:25:47 PDT 2025

mstorsjo wrote:

Hi Alex!

Sorry to say, but this commit seems to have introduced a miscompilation for aarch64 targets. When compiling Clang for aarch64-w64-mingw32 (possibly any aarch64 Windows, possibly any aarch64 target) from a Clang source version at 378ac572accc6d14ed36136f261951f8607a9169 or later, with a Clang version including this commit, then the resulting Clang binary would crash when compiling - but not always.

(I didn't notice this right away, as I don't have full continuous testing of Clang running on Windows on ARM, I only test some cross compiled applications like ffmpeg, in my local nightly testing. But as free github actions runners with Windows on ARM is available now, I should be able to test this continuously; that's how I noticed the issue, when testing out such setups.)

In any case, once pinpointed, the issue looks quite clear. The issue can be reproduced with https://martin.st/temp/DAGCombiner-preproc.cpp.

This input file, compiled like this:
```
$ clang -target aarch64-w64-mingw32 -O3 -std=c++17 -fno-exceptions -funwind-tables -fno-rtti DAGCombiner-preproc.cpp -S -o dagcombiner.s
```

After this commit, we can observe the following diff in the generated output:
```diff

--- dagcombiner-good.s  2025-04-25 16:18:25.317278754 +0300
+++ dagcombiner-bad.s   2025-04-25 16:16:10.153507344 +0300
@@ -87849,7 +87849,6 @@
        b.ne    .LBB167_141
 .LBB167_121:
        ldr     x8, [sp, #80]                   // 8-byte Folded Reload
-       mov     w29, w29
        ldr     x8, [x8, #48]
        ldr     q0, [x8, x29, lsl #4]
        umov    w8, v0.h[0]
@@ -97136,7 +97135,6 @@
        lsl     x10, x9, #3
        add     w9, w9, #1
        str     w9, [x20, #12]
-       mov     w9, w9
        str     x1, [x8, x10]
        mov     w1, #1                          // =0x1
        ldr     x8, [x20]
@@ -129458,7 +129456,6 @@
 // %bb.61:
        mov     w8, w27
        mov     w0, w20
-       mov     w29, w29
        str     x8, [sp, #40]                   // 8-byte Folded Spill
        bl      _ZN4llvm3ISD23getSetCCSwappedOperandsENS0_8CondCodeE
        mov     w27, w0
@@ -153275,7 +153272,6 @@
        lsl     x10, x9, #3
        add     w9, w9, #1
        str     w9, [x20, #12]
-       mov     w9, w9
        str     x1, [x8, x10]
        mov     w1, #1                          // =0x1
        ldr     x8, [x20]
@@ -202049,7 +202045,6 @@
        add     x8, x10, w8, uxth #1
        ldurh   w0, [x8, #-2]
        and     x8, x0, #0xffff
-       mov     x8, x8
        ands    x9, x8, #0xffff
        stp     x8, xzr, [sp, #224]
        b.ne    .LBB792_16
@@ -202959,7 +202954,6 @@
        add     x8, x10, w8, uxtw #1
        ldurh   w0, [x8, #-2]
        and     x8, x0, #0xffff
-       mov     x8, x8
        ands    x9, x8, #0xffff
        stp     x8, xzr, [sp, #48]
        b.ne    .LBB794_22
```


Removing instructions like `mov x8, x8` should be fine to do. However, removing an instruction like `mov w29, w29` may not be. On aarch64, w29 is the 32 bit lower half of the 64 bit register x29, and an instruction like `mov w29, w29` will implicitly clear the upper 32 bits of the register. In the first context of the diff, we can see the following instruction `ldr q0, [x8, x29, lsl #4]`. And that instruction uses the whole 64 bit register `x29` as offset/index, while the `mov w29, w29` will have cleared the upper bits. (I think this should be possible to express as `ldr q0, [x8, w29, uxtw #4]` as well - but that's not what we have here anyway.)

In any case, that first removed `mov w29, w29` is the culprit - as it now uses potentially uninitialized/undefined upper bits as offset. This explains why the built Clang now sometimes runs successfully and sometimes crashes.

Can we revert until we have this issue fixed properly? (Offhand I have no clue how to fix it.) As this has been in tree for over a month already, chances are that many things depend on it and can't be trivially reverted at this point.

CC @davemgreen @DavidSpickett for the aarch64 backend.

https://github.com/llvm/llvm-project/pull/129889