[llvm] [AArch64] Add custom lowering for load <3 x i8>. (PR #78632)

Florian Hahn via llvm-commits llvm-commits at lists.llvm.org
Fri Jan 19 11:09:45 PST 2024


fhahn wrote:

> Actually, I guess the following is the shortest, at 2 instructions:
> 
> ```
> uint8x8_t load_3byte_insert_byte(char* a) {
>   return vld1_lane_s8(a+2, vld1_dup_u16(a), 2);
> }
> ```

Thanks, this is indeed more compact. I tried to massage the SelectionDAG nodes to generate it (https://github.com/llvm/llvm-project/commit/7cc78c52f481161d7195ac4c7f9ec05b1cd1f442) but it appears there are some cases where this results in slightly more code. I can check where those differences are coming from.

In terms of overall cycles, both sequences should be mostly equivalent on the CPUs I checked.

https://github.com/llvm/llvm-project/pull/78632


More information about the llvm-commits mailing list