[llvm] [MachineCopyPropagation] Recognise and delete no-op moves produced after forwarded uses (PR #129889)
Martin Storsjö via llvm-commits
llvm-commits at lists.llvm.org
Fri Apr 25 06:25:47 PDT 2025
mstorsjo wrote:
Hi Alex!
Sorry to say, but this commit seems to have introduced a miscompilation for aarch64 targets. When compiling Clang for aarch64-w64-mingw32 (possibly any aarch64 Windows, possibly any aarch64 target) from a Clang source version at 378ac572accc6d14ed36136f261951f8607a9169 or later, with a Clang version including this commit, then the resulting Clang binary would crash when compiling - but not always.
(I didn't notice this right away, as I don't have full continuous testing of Clang running on Windows on ARM, I only test some cross compiled applications like ffmpeg, in my local nightly testing. But as free github actions runners with Windows on ARM is available now, I should be able to test this continuously; that's how I noticed the issue, when testing out such setups.)
In any case, once pinpointed, the issue looks quite clear. The issue can be reproduced with https://martin.st/temp/DAGCombiner-preproc.cpp.
This input file, compiled like this:
```
$ clang -target aarch64-w64-mingw32 -O3 -std=c++17 -fno-exceptions -funwind-tables -fno-rtti DAGCombiner-preproc.cpp -S -o dagcombiner.s
```
After this commit, we can observe the following diff in the generated output:
```diff
--- dagcombiner-good.s 2025-04-25 16:18:25.317278754 +0300
+++ dagcombiner-bad.s 2025-04-25 16:16:10.153507344 +0300
@@ -87849,7 +87849,6 @@
b.ne .LBB167_141
.LBB167_121:
ldr x8, [sp, #80] // 8-byte Folded Reload
- mov w29, w29
ldr x8, [x8, #48]
ldr q0, [x8, x29, lsl #4]
umov w8, v0.h[0]
@@ -97136,7 +97135,6 @@
lsl x10, x9, #3
add w9, w9, #1
str w9, [x20, #12]
- mov w9, w9
str x1, [x8, x10]
mov w1, #1 // =0x1
ldr x8, [x20]
@@ -129458,7 +129456,6 @@
// %bb.61:
mov w8, w27
mov w0, w20
- mov w29, w29
str x8, [sp, #40] // 8-byte Folded Spill
bl _ZN4llvm3ISD23getSetCCSwappedOperandsENS0_8CondCodeE
mov w27, w0
@@ -153275,7 +153272,6 @@
lsl x10, x9, #3
add w9, w9, #1
str w9, [x20, #12]
- mov w9, w9
str x1, [x8, x10]
mov w1, #1 // =0x1
ldr x8, [x20]
@@ -202049,7 +202045,6 @@
add x8, x10, w8, uxth #1
ldurh w0, [x8, #-2]
and x8, x0, #0xffff
- mov x8, x8
ands x9, x8, #0xffff
stp x8, xzr, [sp, #224]
b.ne .LBB792_16
@@ -202959,7 +202954,6 @@
add x8, x10, w8, uxtw #1
ldurh w0, [x8, #-2]
and x8, x0, #0xffff
- mov x8, x8
ands x9, x8, #0xffff
stp x8, xzr, [sp, #48]
b.ne .LBB794_22
```
Removing instructions like `mov x8, x8` should be fine to do. However, removing an instruction like `mov w29, w29` may not be. On aarch64, w29 is the 32 bit lower half of the 64 bit register x29, and an instruction like `mov w29, w29` will implicitly clear the upper 32 bits of the register. In the first context of the diff, we can see the following instruction `ldr q0, [x8, x29, lsl #4]`. And that instruction uses the whole 64 bit register `x29` as offset/index, while the `mov w29, w29` will have cleared the upper bits. (I think this should be possible to express as `ldr q0, [x8, w29, uxtw #4]` as well - but that's not what we have here anyway.)
In any case, that first removed `mov w29, w29` is the culprit - as it now uses potentially uninitialized/undefined upper bits as offset. This explains why the built Clang now sometimes runs successfully and sometimes crashes.
Can we revert until we have this issue fixed properly? (Offhand I have no clue how to fix it.) As this has been in tree for over a month already, chances are that many things depend on it and can't be trivially reverted at this point.
CC @davemgreen @DavidSpickett for the aarch64 backend.
https://github.com/llvm/llvm-project/pull/129889
More information about the llvm-commits
mailing list