[all-commits] [llvm/llvm-project] 088db8: [RISCV] Merge shuffle sources if lanes are disjoin...

Wed Dec 11 21:41:49 PST 2024

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 088db868f3370ffe01c9750f75732679efecd1fe
      https://github.com/llvm/llvm-project/commit/088db868f3370ffe01c9750f75732679efecd1fe
  Author: Luke Lau <luke at igalia.com>
  Date:   2024-12-12 (Thu, 12 Dec 2024)

  Changed paths:
    M llvm/lib/Target/RISCV/RISCVISelLowering.cpp
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-shuffles.ll
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-deinterleave.ll

  Log Message:
  -----------
  [RISCV] Merge shuffle sources if lanes are disjoint (#119401)

In x264, there's a few kernels with shuffles like this:

    %41 = add nsw <16 x i32> %39, %40
    %42 = sub nsw <16 x i32> %39, %40
%43 = shufflevector <16 x i32> %41, <16 x i32> %42, <16 x i32> <i32 11,
i32 15, i32 7, i32 3, i32 26, i32 30, i32 22, i32 18, i32 9, i32 13, i32
5, i32 1, i32 24, i32 28, i32 20, i32 16>

Because this is a complex two-source shuffle, this will get lowered as
two vrgather.vvs that are blended together.

    vadd.vv v20, v16, v12
    vsub.vv v12, v16, v12
    vrgatherei16.vv v24, v20, v10
    vrgatherei16.vv v24, v12, v16, v0.t

However the indices coming from each source are disjoint, so we can
blend the two together and perform a single source shuffle instead:

    %41 = add nsw <16 x i32> %39, %40
    %42 = sub nsw <16 x i32> %39, %40
    %43 = select <0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1> %41, %42
%44 = shufflevector <16 x i32> %43, <16 x i32> poison, <16 x i32> <i32
11, i32 15, i32 7, i32 3, i32 10, i32 14, i32 6, i32 2, i32 9, i32 13,
i32 5, i32 1, i32 8, i32 12, i32 4, i32 0>

The select will likely get merged into the preceding instruction, and
then we only have to do one vrgather.vv:

    vadd.vv v20, v16, v12
    vsub.vv v20, v16, v12, v0.t
    vrgatherei16.vv v24, v20, v10

This patch bails if either of the sources are a broadcast/splat/identity
shuffle, since that will usually already have some sort of cheaper
lowering.

This improves performance on 525.x264_r by 4.12% with -O3 -flto
-march=rva22u64_v on the spacemit-x60.

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications