[PATCH] D28118: AVX-512 cost calculation for interleave load/store patterns

Thu Dec 29 09:57:25 PST 2016

mkuper added inline comments.

================
Comment at: ../include/llvm/CodeGen/BasicTTIImpl.h:65
+  /// insert operations.
+  unsigned getAllPermutationsShuffleOverhead(Type *Ty) {
     assert(Ty->isVectorTy() && "Can only shuffle vectors");
----------------
delena wrote:
> mkuper wrote:
> > Why "All Permutations"?
> > getPermuteShuffleOverhead(), maybe? Also, I'm not sure what this has to do with permutations, especially given the example below. (TBH, It didn't have anything to do with SK_Alternate, it was just used that way. That wasn't really good either).
> In the worst case, the shuffle is being replaced with "extracts" and "inserts". In my opinion, SK_Reverse should also have this overhead. And the SK_Broadcast is not always 1 instruction. But I don't want to fix  everything in this patch.
I didn't mean to imply you should, just trying to figure out how we can at least move this in the right direction - even if just in terms of naming/documentation. :-)

================
Comment at: ../include/llvm/CodeGen/BasicTTIImpl.h:359
     }
     return 1;
   }
----------------
delena wrote:
> mkuper wrote:
> > This code makes very little sense to me. Not your change, but the original code.
> > Why does this special-case "alt shuffle", of all things? And why should this special-case only these specific shuffles after the change?
> > 
> > I think this is backwards - there may be specific shuffle types that are cheap by default - e.g. broadcast makes sense. Reverse? Not so much.
> Reverse is not cheap for all types, VPERMW, for example, appears on AVX-512-BW. And broadcast for i16/i8 was added to AVX2 ISA.
Sure.
And this isn't even X86-specific, this is just trying to give sane defaults for all targets, and I don't think the current code does that.
Could you please add a FIXME here?

================
Comment at: ../lib/Analysis/CostModel.cpp:112
+  }
+  return !(Vec0 && Vec1);
+}
----------------
Not 100% related, but I would expect us to canonicalize shuffles to avoid the "all elements come from the second input" case.
(I'm not saying you should remove the check, since the cost is fairly minor... but I'm curious whether this is the case)

================
Comment at: ../lib/Target/X86/X86TargetTransformInfo.cpp:886
+        return LT.first * Entry->Cost;
   }

----------------
delena wrote:
> mkuper wrote:
> > Are you planning on adding SSE4, AVX and AVX2 costs as well?
> > This isn't a blocker for this patch, and should be a separate patch in any case, I'm just curious.
> It should be added, I'm not sure that I'll be able to take it immediately after this patch. (and I have one more patch that should compare "interleave" with gather/scatter).
> I'll try to find an example, where high "interleaving" cost on AVX2 prevents vectorization and fill PR.
> But on AVX2, where we do not have any gather (or at least do not consider it as an option), we should compare "interleave" with strided-scalar. Mohamed is working on reducing scalar cost for strided access. We'll see what happens on AVX2 and earlier ISAs after his patch.
SGTM.

================
Comment at: ../lib/Target/X86/X86TargetTransformInfo.cpp:2207
+
+  // Store.
+  // There is no strided stores meanwhile. And store can't be folded in
----------------
Could you add an assert here?

================
Comment at: ../test/Analysis/CostModel/X86/shuffle-reverse.ll:126
+  ; AVX512F: cost of 2 {{.*}} %V256 = shufflevector
+  ; AVX512BW: cost of 1 {{.*}} %V256 = shufflevector
   %V256 = shufflevector <16 x i16> %src256, <16 x i16> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
----------------
I thought this patch was not supposed to touch the costs of SK_Reverse shuffles.
Did we end up considering this shuffle SK_PermuteSingleSrc?

================
Comment at: ../test/Analysis/CostModel/X86/shuffle-two-src.ll:4
+;
+; Verify the cost model for 1 src shuffles
+;
----------------
1 -> 2

Repository:
  rL LLVM

https://reviews.llvm.org/D28118