[PATCH] D142536: [X86] lowerShuffleAsLanePermuteAndRepeatedMask - retain the per-lane undef elements and don't just copy the repeated mask

Wed Jan 25 06:35:06 PST 2023

RKSimon created this revision.
RKSimon added reviewers: craig.topper, pengfei, lebedev.ri.
Herald added subscribers: StephenFan, hiraditya.
Herald added a project: All.
RKSimon requested review of this revision.
Herald added a project: LLVM.

lowerShuffleAsLanePermuteAndRepeatedMask expands a shuffle from `shuffle(x,y,mask)` to `shuffle(shuffle(x,y,lanemask1),shuffle(x,y,lanemask2),repeatedinlanemask)`

However, we weren't making use of the fact that elements of the original mask might be undef - instead of fully applying the entire repeatedinlanemask to every lane, we can simplify the mask if we never demanded that element in the original mask.

Yet another improvement addressing regressions from D127115 <https://reviews.llvm.org/D127115>

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D142536

Files:
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/test/CodeGen/X86/any_extend_vector_inreg_of_broadcast.ll
  llvm/test/CodeGen/X86/any_extend_vector_inreg_of_broadcast_from_memory.ll
  llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-6.ll
  llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-5.ll
  llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-6.ll
  llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-7.ll
  llvm/test/CodeGen/X86/vector-interleaved-load-i8-stride-6.ll
  llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-7.ll
  llvm/test/CodeGen/X86/zero_extend_vector_inreg_of_broadcast.ll
  llvm/test/CodeGen/X86/zero_extend_vector_inreg_of_broadcast_from_memory.ll