[PATCH] D103274: [X86] AMD Zen 3 has fast per-lane variable shuffles

Fri May 28 09:08:16 PDT 2021

lebedev.ri marked 2 inline comments as done.
lebedev.ri added inline comments.

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:36097
   // which is much simpler than any shuffle.
-  if (UnaryShuffle && MaskContainsZeros && AllowVariableMask &&
       isSequentialOrUndefOrZeroInRange(Mask, 0, NumMaskElts, 0) &&
----------------
lebedev.ri wrote:
> RKSimon wrote:
> > lebedev.ri wrote:
> > > lebedev.ri wrote:
> > > > craig.topper wrote:
> > > > > lebedev.ri wrote:
> > > > > > This one is weird. I'm not sure why fast-ness of variable shuffles matters here.
> > > > > You mean because it's just AND/FAND not a shuffle? Fastness was added to AllowVariableMask later. Initially it was just the depth check. Probably guarding the constant pool?
> > > > > You mean because it's just AND/FAND not a shuffle?
> > > > 
> > > > Yes.
> > > > 
> > > > > Fastness was added to AllowVariableMask later. Initially it was just the depth check. Probably guarding the constant pool?
> > > > 
> > > > Yes, i think so.
> > > Though, if i change the guard to ` `
> > Yeah its a legacy thing, and we don't have any good way to gauge the impact of vector constant masks in isel, so we're just a bit cautious :(
> Yep. I've poked at this, and i'm not sure if/how we could lift it,
> all i tried seemed to make things not better. But then maybe 
> things are already bad in-the-wild, and we just don't know that because of tests..
> 
> So i'm personally mostly fine with the `AllowVariablePerLaneMask` guard here as it is now.
> Though, if i change the guard to  

disregard, forgot to delete

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103274/new/

https://reviews.llvm.org/D103274