[PATCH] D142536: [X86] lowerShuffleAsLanePermuteAndRepeatedMask - retain the per-lane undef elements and don't just copy the repeated mask
Simon Pilgrim via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jan 25 06:35:06 PST 2023
RKSimon created this revision.
RKSimon added reviewers: craig.topper, pengfei, lebedev.ri.
Herald added subscribers: StephenFan, hiraditya.
Herald added a project: All.
RKSimon requested review of this revision.
Herald added a project: LLVM.
lowerShuffleAsLanePermuteAndRepeatedMask expands a shuffle from `shuffle(x,y,mask)` to `shuffle(shuffle(x,y,lanemask1),shuffle(x,y,lanemask2),repeatedinlanemask)`
However, we weren't making use of the fact that elements of the original mask might be undef - instead of fully applying the entire repeatedinlanemask to every lane, we can simplify the mask if we never demanded that element in the original mask.
Yet another improvement addressing regressions from D127115 <https://reviews.llvm.org/D127115>
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D142536
Files:
llvm/lib/Target/X86/X86ISelLowering.cpp
llvm/test/CodeGen/X86/any_extend_vector_inreg_of_broadcast.ll
llvm/test/CodeGen/X86/any_extend_vector_inreg_of_broadcast_from_memory.ll
llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-6.ll
llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-5.ll
llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-6.ll
llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-7.ll
llvm/test/CodeGen/X86/vector-interleaved-load-i8-stride-6.ll
llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-7.ll
llvm/test/CodeGen/X86/zero_extend_vector_inreg_of_broadcast.ll
llvm/test/CodeGen/X86/zero_extend_vector_inreg_of_broadcast_from_memory.ll
More information about the llvm-commits
mailing list