[PATCH] D35700: DAGCombiner: Extend reduceBuildVecToTrunc to handle non-zero offset

Sun Jul 23 08:46:40 PDT 2017

guyblank added inline comments.

================
Comment at: include/llvm/Target/TargetLowering.h:2779
+  //  -->
+  // v4i32 truncate (bitcast (shuffle<1,u,3,u,4,u,5,u,6,u,7,u> V, u) to v4i64)
+  virtual bool isDesirableToCombineStridedeBuildVectorToShuffleTruncate(
----------------
shuffle mask should be <1,u,3,u,5,u,7,u>

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:14433

   // The first BUILD_VECTOR operand must be an an extract from index zero
   // (assuming no undef and little-endian).
----------------
update comment

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:14442
+  // Check for profitability before proceeding with more expensive checks.
+  if (!TLI.isDesirableToCombineStridedeBuildVectorToShuffleTruncate(
+          Stride, Offset, ExtractedFromVec.getValueType(), VT))
----------------
I think you should add 
Offset != 0
to the condition

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:14473
+    }
+    if (!DAG.getTargetLoweringInfo().isShuffleMaskLegal(Mask, ExtractedVT))
+      return SDValue();
----------------
you can use TLI

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:35835
+      Offset == 0 ||
+      // For 32-bit elements VPERMD is better than shuffle+truncate
+      (SrcVT.getScalarSizeInBits() != 32 && Subtarget.hasAVX2());
----------------
what about VPERMW for 16-bit elements?

https://reviews.llvm.org/D35700