[PATCH] D159267: [AArch64] Remove copy instruction between uaddlv and dup

Sun Sep 10 03:39:11 PDT 2023

dmgreen added a comment.

> To be sure, I would like to check one thing. As far as I understand, the endianness affects to the order in memory so we need `rev` instruction after load and before store. After `rev` instruction, we do not need to care the endianness. Is it correct or wrong? There are other rules for big endian on AArch64?

bitcasts are defined as store+load, so can change the lane order. NVCast acts upon the representation in the vector so keeps the lanes in the same order. Vector function arguments are also passes in a particular order that sometimes needs to be considered (they often need a rev).

> For big endian output of the ctpop_i32, I can see `rev` instruction because `AArch64TargetLowering::LowerCTPOP_PARITY` generates `bitcast` from i64 to v8i8. Does it also need to be changed to `NVCAST`? It seems we could need to be careful to use` bitcast` which causes `rev` instruction for big endian...

I think for this specific case it does not actually matter. Because the rev is into a cnt and a addlv on the individual i8 elements, and the addlv is performing a (commutative) reduction, it doesn't matter if the lanes get reversed. We still sum up the same values. So it could be either a BITCAST or a NVCAST and both should work (although I'm not sure a NVCAST between i64 and vectors is defined).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159267/new/

https://reviews.llvm.org/D159267