[PATCH] D159267: [AArch64] Remove copy instruction between uaddlv and dup

Fri Sep 8 02:45:08 PDT 2023

jaykang10 added a comment.

In D159267#4641539 <https://reviews.llvm.org/D159267#4641539>, @dmgreen wrote:

> In D159267#4641051 <https://reviews.llvm.org/D159267#4641051>, @efriedma wrote:
>
>> re: the big-endian stuff I mentioned on the other ticket... it looks like it isn't a regression, but my concern is the code generated for ctpop_i32 for a big-endian target.  uaddlv v16i8 produces a result in h0 (element 0 of an 8 x i16), but we then access it as s0 (element 0 of a 4 x i32) without a bitcast.  So I think the bits end up in the wrong place?
>
> I think it's the other way around (hopefully I have it the right way around, BE can be confusing). A bitcast would swap the lane indices (it acts as a load and a store). Otherwise lane 0 is the lowest lane in both llvmir and the neon registers.

To be sure, I would like to check one thing. As far as I understand, the endianness affects to the order in memory so we need `rev` instruction after load and before store. After `rev` instruction, we do not need to care the endianness. Is it correct or wrong? There are other rules for big endian on AArch64?
For big endian output of the ctpop_i32, I can see `rev` instruction because `AArch64TargetLowering::LowerCTPOP_PARITY` generates `bitcast` from i64 to v8i8. Does it also need to be changed to `NVCAST`? It seems we could need to be careful to use` bitcast` which causes `rev` instruction for big endian...

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159267/new/

https://reviews.llvm.org/D159267