[PATCH] D28118: AVX-512 cost calculation for interleave load/store patterns

Thu Dec 29 01:16:22 PST 2016

delena marked 2 inline comments as done.
delena added inline comments.

================
Comment at: ../include/llvm/Analysis/TargetTransformInfo.h:467
+    SK_ExtractSubvector,///< ExtractSubvector Index indicates start offset.
+    SK_MergeTwoSrc,     ///< Merge two vectors into one.
+    SK_PermuteOneSrc    ///< Shuffle elements of one source vector with any
----------------
mkuper wrote:
> Can you make this clearer? It's not obvious what "merge" means. Does the order matter?
The order does not matter. I meant any permutation of elements from 2 source vectors. I've changed the comment.

================
Comment at: ../include/llvm/CodeGen/BasicTTIImpl.h:65
+  /// insert operations.
+  unsigned getAllPermutationsShuffleOverhead(Type *Ty) {
     assert(Ty->isVectorTy() && "Can only shuffle vectors");
----------------
mkuper wrote:
> Why "All Permutations"?
> getPermuteShuffleOverhead(), maybe? Also, I'm not sure what this has to do with permutations, especially given the example below. (TBH, It didn't have anything to do with SK_Alternate, it was just used that way. That wasn't really good either).
In the worst case, the shuffle is being replaced with "extracts" and "inserts". In my opinion, SK_Reverse should also have this overhead. And the SK_Broadcast is not always 1 instruction. But I don't want to fix  everything in this patch.

================
Comment at: ../include/llvm/CodeGen/BasicTTIImpl.h:359
     }
     return 1;
   }
----------------
mkuper wrote:
> This code makes very little sense to me. Not your change, but the original code.
> Why does this special-case "alt shuffle", of all things? And why should this special-case only these specific shuffles after the change?
> 
> I think this is backwards - there may be specific shuffle types that are cheap by default - e.g. broadcast makes sense. Reverse? Not so much.
Reverse is not cheap for all types, VPERMW, for example, appears on AVX-512-BW. And broadcast for i16/i8 was added to AVX2 ISA.

================
Comment at: ../lib/Target/X86/X86TargetTransformInfo.cpp:804
+                                              ISD::VECTOR_SHUFFLE, LT.second))
+        return LT.first * Entry->Cost;
+
----------------
mkuper wrote:
> Are you sure this is correct? I mean, this is fine for SK_Reverse, but I don't think it works for general shuffles.
> 
> I mean, let's say you have a shuffle of two v256i8.
> Legalization will give you two sets of 4 * v64i8, but you don't end up with 4 two-input shuffles, since each of the 4 output vectors may depend on any subset of the 8 input vectors, so you may need a lot more shuffles.
> 
> Am I missing something?
You are not missing anything. The cost, given here is the right cost for the legal types. 
After split the cost should be (NumOfSrc*2  -1)*Entry->Cost. I took this into account in AVX-512 calculations.
I'll fix.

================
Comment at: ../lib/Target/X86/X86TargetTransformInfo.cpp:851
+                                              ISD::VECTOR_SHUFFLE, LT.second))
+        return LT.first * Entry->Cost;
+
----------------
mkuper wrote:
> Same as above, I don't think this works for PermuteOneSrc either.
I fixed and added a test. thanks.

================
Comment at: ../lib/Target/X86/X86TargetTransformInfo.cpp:886
+        return LT.first * Entry->Cost;
   }

----------------
mkuper wrote:
> Are you planning on adding SSE4, AVX and AVX2 costs as well?
> This isn't a blocker for this patch, and should be a separate patch in any case, I'm just curious.
It should be added, I'm not sure that I'll be able to take it immediately after this patch. (and I have one more patch that should compare "interleave" with gather/scatter).
I'll try to find an example, where high "interleaving" cost on AVX2 prevents vectorization and fill PR.
But on AVX2, where we do not have any gather (or at least do not consider it as an option), we should compare "interleave" with strided-scalar. Mohamed is working on reducing scalar cost for strided access. We'll see what happens on AVX2 and earlier ISAs after his patch.

Repository:
  rL LLVM

https://reviews.llvm.org/D28118