[llvm] [AArch64] Add custom lowering for load <3 x i8>. (PR #78632)
Florian Hahn via llvm-commits
llvm-commits at lists.llvm.org
Fri Jan 19 11:09:45 PST 2024
fhahn wrote:
> Actually, I guess the following is the shortest, at 2 instructions:
>
> ```
> uint8x8_t load_3byte_insert_byte(char* a) {
> return vld1_lane_s8(a+2, vld1_dup_u16(a), 2);
> }
> ```
Thanks, this is indeed more compact. I tried to massage the SelectionDAG nodes to generate it (https://github.com/llvm/llvm-project/commit/7cc78c52f481161d7195ac4c7f9ec05b1cd1f442) but it appears there are some cases where this results in slightly more code. I can check where those differences are coming from.
In terms of overall cycles, both sequences should be mostly equivalent on the CPUs I checked.
https://github.com/llvm/llvm-project/pull/78632
More information about the llvm-commits
mailing list