[llvm] [ARM][Codegen] Fix vector data miscompilation in arm32be (PR #105519)

Mon Aug 26 13:48:21 PDT 2024

================
@@ -414,9 +414,8 @@ define arm_aapcs_vfpcc <16 x i8> @test(<16 x i8> %i) {
 ; CHECKBE-LABEL: test:
 ; CHECKBE:       @ %bb.0: @ %entry
 ; CHECKBE-NEXT:    vmov.i64 q1, #0xff00ff000000ff00
-; CHECKBE-NEXT:    vrev64.8 q2, q1
-; CHECKBE-NEXT:    vrev64.8 q1, q0
-; CHECKBE-NEXT:    vorr q1, q1, q2
+; CHECKBE-NEXT:    vrev64.8 q2, q0
+; CHECKBE-NEXT:    vorr q1, q2, q1
----------------
davemgreen wrote:

Thanks. The intent we are aiming for is that vmovimm are always generated in "natural vector" lane order, where the lower values are in the lower lanes (same as little endian). Hence we don't pass IsBigEndian to isConstantSplat and use VECTOR_REG_CAST as opposed to BITCAST.

That means this linked bit of code should not be needed and can be removed, as the values from i64 vectors do not need to be rearranged: https://github.com/llvm/llvm-project/blob/178fc4779ece31392a2cd01472b0279e50b3a199/llvm/lib/Target/ARM/ARMISelLowering.cpp#L7124

I think that should clear up a lot of the cases that are not `bitcast(reg_cast(vmovimm))` (as happens when returning immediates from functions). Those I think are going wrong because of the code in PerformBITCASTCombine, that is mis-compiling that particular case.

https://github.com/llvm/llvm-project/pull/105519