<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/140707>140707</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [AArch64] Miscompile due to 32-bit insertelement lowered to 8-bit move
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          aleks-tmb
      </td>
    </tr>
</table>

<pre>
    After the changes introduced in [#136091](https://github.com/llvm/llvm-project/pull/136091), we started experiencing a miscompile on AArch64 in our local testing. Here is a reduced LLVM IR example:

```llvm
; ModuleID = 'Test.ll'
target triple = "aarch64-none-linux-gnu"

define i32 @main(ptr addrspace(1) %p) {
  %1 = load <2 x i32>, ptr addrspace(1) %p, align 4
  %2 = extractelement <2 x i32> %1, i64 0
  %3 = call i32 @llvm.ctpop.i32(i32 %2)
  %4 = insertelement <2 x i32> <i32 -1, i32 poison>, i32 %3, i64 1
  %5 = sub <2 x i32> %1, %4
 store <2 x i32> %5, ptr addrspace(1) %p, align 4
  ret i32 0
}

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare i32 @llvm.ctpop.i32(i32)
```
Here is an `llc (trunk)` output:
```s
main:
        ldr     d0, [x0]
        movi    v2.2d, #0xffffffffffffffff
        mov     x8, x0
        mov     w0, wzr
 fmov    w9, s0
        fmov    s1, w9
        cnt     v1.8b, v1.8b
 addv    b1, v1.8b
        mov     v2.b[4], v1.b[0]
        sub     v0.2s, v0.2s, v2.2s
        str     d0, [x8]
 ret
```
https://godbolt.org/z/c5YP7rYbE

Current transformation seems to be incorrect because:

- The instruction `mov v2.b[4], v1.b[0]` only updates a single byte (byte 4) of the v2 vector register.
- However, the LLVM IR expects a full 32-bit insertion into the second element (insertelement at index 1).
- Because the rest of the 32-bit lane in v2.s[1] remains filled with 0xFF (due to `movi v2.2d, #0xFFFFFFFFFFFFFFFF`), the resulting subtraction computes an incorrect value.

`llc 20.1.0` output (before applying #136091)
```s
main:                                   // @main
        ldr     d1, [x0]
        movi    v0.2d, #0xffffffffffffffff
 mov     x8, x0
        mov     w0, wzr
        fmov    w9, s1
 fmov    s2, w9
        cnt     v2.8b, v2.8b
        addv    b2, v2.8b
 fmov    w9, s2
        mov     v0.s[1], w9
        sub     v0.2s, v1.2s, v0.2s
        str     d0, [x8]
 ret
```
https://godbolt.org/z/qqvM1b3hc

Why this is correct?
- `mov v0.s[1], w9` fully overwrites the entire 32-bit lane in v0, matching the semantics of LLVM IR's insertelement.
- This avoids leftover bytes from the earlier `movi` initialization, ensuring the result is correct.
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJy0Vluv2j4S_zTmZUTkOEDCAw9wTlEr9UirVbWrPjrOAN46dmo7XPrpV-MEzoHTi_rwj5AcMlePf_MbyxD03iKu2HzD5s8T2ceD8ytp8FuYxrae1K65rNa7iB7iAUEdpN1jAG2jd02vsAFtgYxFkRcLvszZ_JmJ6hBjF1ixZmLLxHav46GvM-VaJrbGHK_LtPPuf6giE9uuN4aJ7ehDLJl4ghNCiNJHbADPHXqNVmm7BwmtDsq1nTYIzsJ67dVhMaNMXO_BOCUNRAxR230GH9Ej6AASPA4Zf_78nxf49G_As2w7g5QmT78FH34pRb5mxQZeXNMb_PQMrHgGJsovGGJGqZaMr6P0e4wQve4MjhpCypTN1DqLU6Ntf57ubc-EGII0uNMWQRcC2Iy3Ulsmqi56kE3jQycVMlFRBYCJeZfWcsP4Guh_noIYJxtgxZOAM_lhxQeq1m98PIE0em9hdvUjkh88Ry9VRIMt2njvMUUjS72YAb_aFclOSWOuG6BSZSp2rsvIUFTpu5gLOsPRapastA3ofxWseCK76RCxENA5HZwddza6LK7p5FfH8-Q49PUvcqfYpBui8_heZ_5XZfMYUyZUDFY-j5ApNrDtrYqacBijJ9CDJQSaWqpvYN3OI4J14WIVWNfbk7YNhA5Vb2SUtUE4aWM8xt5baLF1_sJERegZStigMtLj7wo-KN7gy_j6BnoLCc8KmKii7-030l1wcH3s-jhC_2oYGF8nRKbPMD6m8WlteKrpfHPm1OSvCq07alqPIhPNUPeCn3cPz71BWs8VaZ_5T0SnFOz0w5NsN349JVoIb_WvopAO_LR8I1I2pvWYZ1VN0uGFr-m8k1Gd331-SOEosprNN7PEaEmN_j7sncCXlHkmQlK7vQh6eaMZH8tYja48xofje6BP19TOxMz5PRPbH0xs1fzrv0r_tf4woPCp956aKnppw875ViY8BsQ2QHRQI2irnPeoItSoZB9eWW8KXw4kD9H3A47ZglMNfrN_ApA1F-i7RkYkbg3a7g1CfYlIUEvrjFrJ7dLgOAo4oorOg8e9DhF9lmJ_dCc8oqcApPZKzR2qSI53vTFQiGmt40ghlKK20SWDgMrZBm60Iqp7npFk1eAZqK-HkJuhAMncY4jXFMcgRhI9W9p-YPMNDTTwSG0RYKeNwQZOOh6An7dbitf0SDUeiqYfmmD78NDxDsNtjN4bmlIEo8TFtDeabH2qqn1zbEdpesxug4paWvAsz_hrN6fC4464TnaduZDj17l8zxFvWh3-_Aw4vA2sn3BD_idu4H_ihr8lhQcCGLkhf8sXQfyaFMSVFMR999-4QdxJH8KIn_EFvwHmXdj3NJHf8cU_QxPfvx9f8ro4qAE2_z1cIB50oMEwwooV29QT15Z_t4UFTx14AXdEf_KacEnQRRu1f98zKelWRnUg8A0N2kobtQrUZmN7M1GG-_tANhIRjayj000Ag7tIMROlBNh51w6BpTca_bXfKEFtddTS6B-J9igDtKH31wyGLnuz6WzSrIpmWSzlBFd5OSvLgn6Tw0osqzJfzsocK9mgqBayruRiVqo6L3ixWEz0SnAx53PBcz4TeZk19U4tFkVTLnkzU8slm3HiCpOlOe38fqJD6HGVz3jJy4mRNZqQbttCWDxBktLlcP488at0Ka77faBBr0MMr26ijiZd08fbLtHSy-s1eGShO6K8cqBxJ_TYkLxK4tYdcdJ7s_rra3pKN9BFfdjPcSX-HwAA__8oRY6c">