[all-commits] [llvm/llvm-project] 8e7618: [X86] Fold BLEND(PERMUTE(X), PERMUTE(Y)) -> PERMUTE...

Mon May 6 03:25:44 PDT 2024

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 8e7618aa21652132f930b6576b92291c5f1d46b6
      https://github.com/llvm/llvm-project/commit/8e7618aa21652132f930b6576b92291c5f1d46b6
  Author: Simon Pilgrim <llvm-dev at redking.me.uk>
  Date:   2024-05-06 (Mon, 06 May 2024)

  Changed paths:
    M llvm/lib/Target/X86/X86ISelLowering.cpp
    M llvm/test/CodeGen/X86/horizontal-sum.ll
    M llvm/test/CodeGen/X86/oddshuffles.ll
    M llvm/test/CodeGen/X86/pr34592.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-3.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-6.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-7.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-3.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-5.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-6.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-7.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-7.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-8.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-6.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-7.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-8.ll
    M llvm/test/CodeGen/X86/vector-shuffle-combining-avx.ll
    M llvm/test/CodeGen/X86/vector-shuffle-combining-sse41.ll

  Log Message:
  -----------
  [X86] Fold BLEND(PERMUTE(X),PERMUTE(Y)) -> PERMUTE(BLEND(X,Y)) (#90219)

If we don't demand the same element from both single source shuffles (permutes), then attempt to blend the sources together first and then perform a merged permute.

For vXi16 blends we have to be careful as these are much more likely to involve byte/word vector shuffles that will result in the creation of additional shuffle instructions.

This fold might be worth it for VSELECT with constant masks on AVX512 targets, but I haven't investigated this yet, but I've tried to write combineBlendOfPermutes so to be prepared for this.

The PR34592 -O0 regression is an unfortunate failure to cleanup with a later pass that calls SimplifyDemandedElts like the -O3 does - I'm not sure how worried we should be tbh.

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications