[PATCH] D44868: [SLP] Distinguish "demanded and shrinkable" from "demanded and not shrinkable" values when determining the minimum bitwidth

Sat Mar 24 09:53:08 PDT 2018

haicheng created this revision.
haicheng added reviewers: mssimpso, ABataev.
Herald added subscribers: javed.absar, mcrosier.

We use two approaches for determining the minimum bitwidth.

1. Demanded bits
2. Value tracking

If demanded bits doesn't result in a narrower type, we then try value tracking. We need this if we want to root SLP trees with the indices of getelementptr instructions since all the bits of the indices are demanded.

But there is a missing piece though. We need to be able to distinguish "demanded and shrinkable" from "demanded and not shrinkable". For example, the bits of %i in

  %i = sext i32 %e1 to i64 
  %gep = getelementptr inbounds i64, i64* %p, i64 %i

are demanded, but we can shrink %i's type to i32 because it won't change the result of the getelementptr. On the other hand, in

  %tmp15 = sext i32 %tmp14 to i64 
  %tmp16 = insertvalue { i64, i64 } undef, i64 %tmp15, 0

it doesn't make sense to shrink %tmp15 and we can skip the value tracking.

This patch still adds z|sext to demote list, but does not use value tracking if they are not shrinkable.  So,  cast<vect>, trunc <vect>, exctract <vect>, cast <extract> are only generated when they are useful (shrinkable).  For now, this patch only considers gep index as a shrinkable usage.

Most words above are from @mssimpso .


Repository:
  rL LLVM

https://reviews.llvm.org/D44868

Files:
  lib/Transforms/Vectorize/SLPVectorizer.cpp
  test/Transforms/SLPVectorizer/AArch64/ext-trunc.ll


Index: test/Transforms/SLPVectorizer/AArch64/ext-trunc.ll
===================================================================

--- /dev/null
+++ test/Transforms/SLPVectorizer/AArch64/ext-trunc.ll
@@ -0,0 +1,37 @@
+; RUN: opt -S -slp-vectorizer -instcombine < %s | FileCheck %s
+
+target datalayout = "e-m:e-i32:64-i128:128-n32:64-S128"
+target triple = "aarch64--linux-gnu"
+
+declare void @foo(i64, i64, i64, i64)
+
+define void @test(<4 x i16> %a, <4 x i16> %b, i64* %p) {
+; Make sure types of sub and its sources are not extended.
+; CHECK-LABEL: @test(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[Z0:%.*]] = zext <4 x i16> [[A:%.*]] to <4 x i32>
+; CHECK-NEXT:    [[Z1:%.*]] = zext <4 x i16> [[B:%.*]] to <4 x i32>
+; CHECK-NEXT:    [[SUB:%.*]] = sub nsw <4 x i32> [[Z0]], [[Z1]]
+entry:
+  %z0 = zext <4 x i16> %a to <4 x i32>
+  %z1 = zext <4 x i16> %b to <4 x i32>
+  %sub0 = sub <4 x i32> %z0, %z1
+  %e0 = extractelement <4 x i32> %sub0, i32 0
+  %s0 = sext i32 %e0 to i64
+  %gep0 = getelementptr inbounds i64, i64* %p, i64 %s0
+  %load0 = load i64, i64* %gep0
+  %e1 = extractelement <4 x i32> %sub0, i32 1
+  %s1 = sext i32 %e1 to i64
+  %gep1 = getelementptr inbounds i64, i64* %p, i64 %s1
+  %load1 = load i64, i64* %gep1
+  %e2 = extractelement <4 x i32> %sub0, i32 2
+  %s2 = sext i32 %e2 to i64
+  %gep2 = getelementptr inbounds i64, i64* %p, i64 %s2
+  %load2 = load i64, i64* %gep2
+  %e3 = extractelement <4 x i32> %sub0, i32 3
+  %s3 = sext i32 %e3 to i64
+  %gep3 = getelementptr inbounds i64, i64* %p, i64 %s3
+  %load3 = load i64, i64* %gep3
+  call void @foo(i64 %load0, i64 %load1, i64 %load2, i64 %load3)
+  ret void
+}
Index: lib/Transforms/Vectorize/SLPVectorizer.cpp
===================================================================
--- lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -4301,24 +4301,9 @@
   // additional roots that require investigating in Roots.
   SmallVector<Value *, 32> ToDemote;
   SmallVector<Value *, 4> Roots;
-  for (auto *Root : TreeRoot) {
-    // Do not include top zext/sext/trunc operations to those to be demoted, it
-    // produces noise cast<vect>, trunc <vect>, exctract <vect>, cast <extract>
-    // sequence.
-    if (isa<Constant>(Root))
-      continue;
-    auto *I = dyn_cast<Instruction>(Root);
-    if (!I || !I->hasOneUse() || !Expr.count(I))
-      return;
-    if (isa<ZExtInst>(I) || isa<SExtInst>(I))
-      continue;
-    if (auto *TI = dyn_cast<TruncInst>(I)) {
-      Roots.push_back(TI->getOperand(0));
-      continue;
-    }
+  for (auto *Root : TreeRoot)
     if (!collectValuesToDemote(Root, Expr, ToDemote, Roots))
       return;
-  }
 
   // The maximum bit width required to represent all the values that can be
   // demoted without loss of precision. It would be safe to truncate the roots
@@ -4347,7 +4332,10 @@
   // We start by looking at each entry that can be demoted. We compute the
   // maximum bit width required to store the scalar by using ValueTracking to
   // compute the number of high-order bits we can truncate.
-  if (MaxBitWidth == DL->getTypeSizeInBits(TreeRoot[0]->getType())) {
+  if (MaxBitWidth == DL->getTypeSizeInBits(TreeRoot[0]->getType()) &&
+      llvm::all_of(TreeRoot, [&](Value *R) {
+        return isa<GetElementPtrInst>(*R->user_begin());
+      })) {
     MaxBitWidth = 8u;
 
     // Determine if the sign bit of all the roots is known to be zero. If not,


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D44868.139716.patch
Type: text/x-patch
Size: 3434 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180324/8de5ba6e/attachment.bin>