[llvm] [AArch64][CodeGen] Fix wrong operand order when creating vcmla intrinsic (PR #65278)

Fri Sep 8 07:27:01 PDT 2023

daisy202309 wrote:

> The test case you have is testing different lanes from the vcmla instrinsics, not just commutativity. The different lanes would be expected to be different. The lane indices would need to be the same to test commutivity:
> 
> ```
>   vst1q_lane_f64(r0, test_rot0(acc, lhs, rhs) , 0);
>   vst1q_lane_f64(&r0[1], test_rot0(acc, rhs, lhs), 0);
> 
>   vst1q_lane_f64(r90, test_rot90(acc, lhs, rhs) , 0);
>   vst1q_lane_f64(&r90[1], test_rot90(acc, rhs, lhs), 0);
> 
>   vst1q_lane_f64(r180, test_rot180(acc, lhs, rhs) , 0);
>   vst1q_lane_f64(&r180[1], test_rot180(acc, rhs, lhs), 0);
> 
>   vst1q_lane_f64(r270, test_rot270(acc, lhs, rhs) , 0);
>   vst1q_lane_f64(&r270[1], test_rot270(acc, rhs, lhs), 0);
> ```
> 
> When you say "Fix wrong operand order" is there a bug here? And if so can you explain where. The same change may need to be applied for ARM too, but I was under the impression that enough testing had been done to catch problems like this (baring maybe some edge cases with Nan's and whatnot)

I meet a run time error in SPECCPU2006 433 because getting a wrong calculation result. And I can get a right result by adding this patch. 
The order of (InputA, inputB) is used in other places of this function, only this place is in the order of (InputB, InputA). I guess this is a clerical error. I tried to build a small example to reproduce this error, but I have not succeeded so far, and the actual scenario is more complicated.

https://github.com/llvm/llvm-project/pull/65278