[llvm] [AArch64][CodeGen] Fix wrong operand order when creating vcmla intrinsic (PR #65278)

Tue Sep 12 14:47:44 PDT 2023

davemgreen wrote:

Is the milc compiled with -Ofast or without fast math? I don't think we've seen the same thing here.

>From what I can tell this this should just change from one complex multiply to another. It will be different but I'm not sure it's better or worse. For example with a normal complex mul:
```
vcmla A, B, C, #0
vcmla A, B, C, #90
```
would be:
```
Ar = ((Ar + Br.Cr) - Bi.Ci)
Ai = ((Ai + Br.Ci) + Bi.Cr)
```
vs, if the operands were the other way:
```
Ar = ((Ar + Cr.Br) - Ci.Bi)
Ai = ((Ai + Cr.Bi) + Ci.Br)
```
The brackets show where the fused multiply-accumulates happen. So there will be slightly different rounding between the two versions, but neither match the original exactly.

The new order sounds like a more natural order, so I'm not against it. It might just be that milc is a little susceptible to fast math rounding differences.

https://github.com/llvm/llvm-project/pull/65278