[llvm] [ARM][Codegen] Fix vector data miscompilation in arm32be (PR #105519)

Sat Aug 31 09:44:01 PDT 2024

davemgreen wrote:

Hi - I think all the changes you have so far are good, but some of the tests are still wrong and it would be good to fix PerformBITCASTCombine too.

If you consider the xor_int64_ff0000ff0000ffff case:
```
define arm_aapcs_vfpcc <2 x i64> @xor_int64_ff0000ff0000ffff(<2 x i64> %a) {
entry:
  %b = xor <2 x i64> %a, <i64 -72056498821201921, i64 -72056498821201921>
  ret <2 x i64> %b
}
```
A series of transforms, which I believe are all OK, lead to:
```
    t3: v2i64 = bitcast t2
        t16: v2i64 = ARMISD::VMOVIMM TargetConstant:i32<7737>
      t17: v4i32 = ARMISD::VECTOR_REG_CAST t16
    t14: v2i64 = bitcast t17
  t6: v2i64 = xor t3, t14
t7: v2f64 = bitcast t6
```
The `bitcast ( VECTOR_REG_CAST ( VMOVIMM ))` is converted to `VECTOR_REG_CAST ( VMOVIMM )` as it is believed that the bitcast does not matter as each lane of the input will be identical. It only looks at the bitcast dst type and the vmovimm type, ignoring that the VECTOR_REG_CAST makes the bitcast important.

It ends up with this, that has the top/bottom half of each i64 in the wrong order.
```
   t16: v2i64 = ARMISD::VMOVIMM TargetConstant:i32<7737>
 t6: v2i64 = xor t3, t16

  vmov.i64        q8, #0xffffff0000ff
  veor    q0, q0, q8
```

https://github.com/llvm/llvm-project/pull/105519