[PATCH] D159267: [AArch64] Remove copy instruction between uaddlv and dup
Dave Green via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sun Sep 10 03:39:11 PDT 2023
dmgreen added a comment.
> To be sure, I would like to check one thing. As far as I understand, the endianness affects to the order in memory so we need `rev` instruction after load and before store. After `rev` instruction, we do not need to care the endianness. Is it correct or wrong? There are other rules for big endian on AArch64?
bitcasts are defined as store+load, so can change the lane order. NVCast acts upon the representation in the vector so keeps the lanes in the same order. Vector function arguments are also passes in a particular order that sometimes needs to be considered (they often need a rev).
> For big endian output of the ctpop_i32, I can see `rev` instruction because `AArch64TargetLowering::LowerCTPOP_PARITY` generates `bitcast` from i64 to v8i8. Does it also need to be changed to `NVCAST`? It seems we could need to be careful to use` bitcast` which causes `rev` instruction for big endian...
I think for this specific case it does not actually matter. Because the rev is into a cnt and a addlv on the individual i8 elements, and the addlv is performing a (commutative) reduction, it doesn't matter if the lanes get reversed. We still sum up the same values. So it could be either a BITCAST or a NVCAST and both should work (although I'm not sure a NVCAST between i64 and vectors is defined).
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D159267/new/
https://reviews.llvm.org/D159267
More information about the llvm-commits
mailing list