[PATCH] D24154: [DAGCombiner] Fix incorrect sinking of a truncate into the operand of a shift.

Thu Sep 1 13:25:23 PDT 2016

andreadb created this revision.
andreadb added reviewers: arsenm, RKSimon, spatel.
andreadb added a subscriber: llvm-commits.
Herald added a subscriber: wdng.

This patch fixes a regression introduced by revision 268094.

Revision 268094 added the following dag combine rule:
// trunc (shl x, K) -> shl (trunc x), K => K < vt.size / 2

That rule converts a truncate of a shift-by-constant into a shift of a truncated value. We do this only if the shift count is less than half the size in bits of the truncated value (K < vt.size / 2).

The problem is that the constraint on the shift count is incorrect. So, the rule doesn't work well in some cases involving vector types.

Example:

;;
define <8 x i16> @trunc_shift(<8 x i32> %a) {
entry:
  %shl = shl <8 x i32> %a, <i32 17, i32 17, i32 17, i32 17, i32 17, i32 17, i32 17>
  %conv = trunc <8 x i32> %shl to <8 x i16>
  ret <8 x i16> %conv
}
;;

According to the above mentioned rule, it is valid to convert the trunc+shift-by-constant into a shift-by-constant of a truncated value.

  (v8i16 (trunc (shl (v8i32 %a, <17,17,17,17,17,17,17,17>)))
-->
  (v8i16 (shl  (v8i16 (trunc v8i32 %a)), <17,17,17,17,17,17,17,17>)

The problem is that the new "shl"  is undefined (the shift count is bigger than the vector element size). So, the dag combiner would later on replace the shift node with an 'undef' value.

The combine rule should have been written instead like this:

// trunc (shl x, K) -> shl (trunc x), K => K < vt.getScalarSizeInBits()

Basically, if K is smaller than the "scalar size in bits" of the truncated value, then we know that by "sinking" the truncate into the operand of the shift we would never accidentally make the shift undefined.

This patch fixes the check on the shift count, and adds a test case to show that we no longer fold the entire computation to undef.

Please let me know if this is okay to commit.

Thanks,
Andrea

https://reviews.llvm.org/D24154

Files:
  lib/CodeGen/SelectionDAG/DAGCombiner.cpp
  test/CodeGen/X86/reduce-trunc-shl.ll

Index: test/CodeGen/X86/reduce-trunc-shl.ll
===================================================================

--- test/CodeGen/X86/reduce-trunc-shl.ll
+++ test/CodeGen/X86/reduce-trunc-shl.ll
@@ -26,3 +26,28 @@
   store <4 x i32> %trunc, <4 x i32> addrspace(1)* %out
   ret void
 }
+
+define <8 x i16> @trunc_shl_v8i16_v8i32(<8 x i32> %a) {
+; SSE2-LABEL: trunc_shl_v8i16_v8i32:
+; SSE2:       # BB#0:
+; SSE2-NEXT:    pslld $17, %xmm0
+; SSE2-NEXT:    pslld $17, %xmm1
+; SSE2-NEXT:    pslld $16, %xmm1
+; SSE2-NEXT:    psrad $16, %xmm1
+; SSE2-NEXT:    pslld $16, %xmm0
+; SSE2-NEXT:    psrad $16, %xmm0
+; SSE2-NEXT:    packssdw %xmm1, %xmm0
+; SSE2-NEXT:    retq
+;
+; AVX2-LABEL: trunc_shl_v8i16_v8i32:
+; AVX2:       # BB#0:
+; AVX2-NEXT:    vpslld $17, %ymm0, %ymm0
+; AVX2-NEXT:    vpshufb {{.*#+}} ymm0 = ymm0[0,1,4,5,8,9,12,13],zero,zero,zero,zero,zero,zero,zero,zero,ymm0[16,17,20,21,24,25,28,29],zero,zero,zero,zero,zero,zero,zero,zero
+; AVX2-NEXT:    vpermq {{.*#+}} ymm0 = ymm0[0,2,2,3]
+; AVX2-NEXT:    # kill: %XMM0<def> %XMM0<kill> %YMM0<kill>
+; AVX2-NEXT:    vzeroupper
+; AVX2-NEXT:    retq
+  %shl = shl <8 x i32> %a, <i32 17, i32 17, i32 17, i32 17, i32 17, i32 17, i32 17, i32 17>
+  %conv = trunc <8 x i32> %shl to <8 x i16>
+  ret <8 x i16> %conv
+}
Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp
===================================================================
--- lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -7184,15 +7184,15 @@
     }
   }
 
-  // trunc (shl x, K) -> shl (trunc x), K => K < vt.size / 2
+  // trunc (shl x, K) -> shl (trunc x), K => K < VT.getScalarSizeInBits()
   if (N0.getOpcode() == ISD::SHL && N0.hasOneUse() &&
       (!LegalOperations || TLI.isOperationLegalOrCustom(ISD::SHL, VT)) &&
       TLI.isTypeDesirableForOp(ISD::SHL, VT)) {
     if (const ConstantSDNode *CAmt = isConstOrConstSplat(N0.getOperand(1))) {
       uint64_t Amt = CAmt->getZExtValue();
-      unsigned Size = VT.getSizeInBits();
+      unsigned Size = VT.getScalarSizeInBits();
 
-      if (Amt < Size / 2) {
+      if (Amt < Size) {
         SDLoc SL(N);
         EVT AmtVT = TLI.getShiftAmountTy(VT, DAG.getDataLayout());
 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D24154.70053.patch
Type: text/x-patch
Size: 2201 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160901/fd9dc32e/attachment.bin>