[PATCH] D103274: [X86] AMD Zen 3 has fast per-lane variable shuffles

Fri May 28 09:07:44 PDT 2021

lebedev.ri marked 3 inline comments as done.
lebedev.ri added a comment.

@craig.topper @RKSimon thank you for taking a look!

================
Comment at: llvm/lib/Target/X86/X86.td:1116
+    [FeatureMacroFusion,
+     FeatureFastVariablePerLaneShuffle];
   list<SubtargetFeature> ZN3Tuning =
----------------
RKSimon wrote:
> If possible I'd make the znver3 change as a followup so its separate from this refactor.
Yep, i'll be committing this separately.
It just made sense to me to submit the review in bulk, to show motivation
(because i'm not adding test coverage for the `+fast-variable-crosslane-shuffle` standalone,
because nothing currently would make use of that.)

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:36097
   // which is much simpler than any shuffle.
-  if (UnaryShuffle && MaskContainsZeros && AllowVariableMask &&
       isSequentialOrUndefOrZeroInRange(Mask, 0, NumMaskElts, 0) &&
----------------
lebedev.ri wrote:
> craig.topper wrote:
> > lebedev.ri wrote:
> > > This one is weird. I'm not sure why fast-ness of variable shuffles matters here.
> > You mean because it's just AND/FAND not a shuffle? Fastness was added to AllowVariableMask later. Initially it was just the depth check. Probably guarding the constant pool?
> > You mean because it's just AND/FAND not a shuffle?
> 
> Yes.
> 
> > Fastness was added to AllowVariableMask later. Initially it was just the depth check. Probably guarding the constant pool?
> 
> Yes, i think so.
Though, if i change the guard to ` `

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:36097
   // which is much simpler than any shuffle.
-  if (UnaryShuffle && MaskContainsZeros && AllowVariableMask &&
       isSequentialOrUndefOrZeroInRange(Mask, 0, NumMaskElts, 0) &&
----------------
RKSimon wrote:
> lebedev.ri wrote:
> > lebedev.ri wrote:
> > > craig.topper wrote:
> > > > lebedev.ri wrote:
> > > > > This one is weird. I'm not sure why fast-ness of variable shuffles matters here.
> > > > You mean because it's just AND/FAND not a shuffle? Fastness was added to AllowVariableMask later. Initially it was just the depth check. Probably guarding the constant pool?
> > > > You mean because it's just AND/FAND not a shuffle?
> > > 
> > > Yes.
> > > 
> > > > Fastness was added to AllowVariableMask later. Initially it was just the depth check. Probably guarding the constant pool?
> > > 
> > > Yes, i think so.
> > Though, if i change the guard to ` `
> Yeah its a legacy thing, and we don't have any good way to gauge the impact of vector constant masks in isel, so we're just a bit cautious :(
Yep. I've poked at this, and i'm not sure if/how we could lift it,
all i tried seemed to make things not better. But then maybe 
things are already bad in-the-wild, and we just don't know that because of tests..

So i'm personally mostly fine with the `AllowVariablePerLaneMask` guard here as it is now.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103274/new/

https://reviews.llvm.org/D103274