[all-commits] [llvm/llvm-project] 8e7618: [X86] Fold BLEND(PERMUTE(X), PERMUTE(Y)) -> PERMUTE...
Simon Pilgrim via All-commits
all-commits at lists.llvm.org
Mon May 6 03:25:44 PDT 2024
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 8e7618aa21652132f930b6576b92291c5f1d46b6
https://github.com/llvm/llvm-project/commit/8e7618aa21652132f930b6576b92291c5f1d46b6
Author: Simon Pilgrim <llvm-dev at redking.me.uk>
Date: 2024-05-06 (Mon, 06 May 2024)
Changed paths:
M llvm/lib/Target/X86/X86ISelLowering.cpp
M llvm/test/CodeGen/X86/horizontal-sum.ll
M llvm/test/CodeGen/X86/oddshuffles.ll
M llvm/test/CodeGen/X86/pr34592.ll
M llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-3.ll
M llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-6.ll
M llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-7.ll
M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-3.ll
M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-5.ll
M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-6.ll
M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-7.ll
M llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-7.ll
M llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-8.ll
M llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-6.ll
M llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-7.ll
M llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-8.ll
M llvm/test/CodeGen/X86/vector-shuffle-combining-avx.ll
M llvm/test/CodeGen/X86/vector-shuffle-combining-sse41.ll
Log Message:
-----------
[X86] Fold BLEND(PERMUTE(X),PERMUTE(Y)) -> PERMUTE(BLEND(X,Y)) (#90219)
If we don't demand the same element from both single source shuffles (permutes), then attempt to blend the sources together first and then perform a merged permute.
For vXi16 blends we have to be careful as these are much more likely to involve byte/word vector shuffles that will result in the creation of additional shuffle instructions.
This fold might be worth it for VSELECT with constant masks on AVX512 targets, but I haven't investigated this yet, but I've tried to write combineBlendOfPermutes so to be prepared for this.
The PR34592 -O0 regression is an unfortunate failure to cleanup with a later pass that calls SimplifyDemandedElts like the -O3 does - I'm not sure how worried we should be tbh.
To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications
More information about the All-commits
mailing list