[PATCH] Extend LoopVectorizationLegality::isConsecutivePtr to handle multiple level GEPs

Wed Jun 17 15:10:57 PDT 2015

Hi Wei,

Please find some questions inline. Also, could you please clarify, if this patch depends on http://reviews.llvm.org/D9865 or not?

Thanks,
Michael


REPOSITORY
  rL LLVM

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1738-1739
@@ -1699,2 +1737,4 @@
   GetElementPtrInst *Gep = dyn_cast<GetElementPtrInst>(Ptr);
-  if (Gep && Legal->isInductionVariable(Gep->getPointerOperand())) {
+  if (Gep && (GepPtr = Gep->getPointerOperand()) &&
+      (GepPtrInst = dyn_cast<Instruction>(GepPtr)) &&
+      !SE->isLoopInvariant(SE->getSCEV(GepPtrInst), OrigLoop)) {
----------------
Could we use `GepPtrInst = dyn_cast_or_null(Gep->getPointerOperand()` instead of `(GepPtr = Gep->getPointerOperand()) && (GepPtrInst = dyn_cast<Instruction>(GepPtr)`?

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1740
@@ +1739,3 @@
+      (GepPtrInst = dyn_cast<Instruction>(GepPtr)) &&
+      !SE->isLoopInvariant(SE->getSCEV(GepPtrInst), OrigLoop)) {
+    // The case Gep->getPointerOperand() is an induction variable
----------------
I don't understand why it's correct. Could you please clarify the logic behind it?

Originally the condition was true when the pointer operand was an induction variable. Now it can be true for an arbitrary non-invariant expression that happen to have a specific gep-structure.

================
Comment at: test/Transforms/LoopVectorize/pr23580.ll:47-69
@@ +46,25 @@
+for.body:                                         ; preds = %for.cond13
+  %idxprom15 = sext i32 %k.0 to i64
+  %arrayidx16 = getelementptr inbounds %struct.B, %struct.B* %add.ptr, i64 %idxprom15
+  %ival = getelementptr inbounds %struct.B, %struct.B* %arrayidx16, i32 0, i32 0
+  %tmp9 = load i16, i16* %ival, align 2
+  %conv17 = sext i16 %tmp9 to i32
+  %add = add nsw i32 %k.0, 1
+  %idxprom18 = sext i32 %add to i64
+  %arrayidx19 = getelementptr inbounds %struct.B, %struct.B* %add.ptr, i64 %idxprom18
+  %ival20 = getelementptr inbounds %struct.B, %struct.B* %arrayidx19, i32 0, i32 0
+  %tmp10 = load i16, i16* %ival20, align 2
+  %conv21 = sext i16 %tmp10 to i32
+  %add22 = add nsw i32 %conv17, %conv21
+  %mul = mul nsw i32 %tmp1, %add22
+  %add23 = add nsw i32 %tmp4, %mul
+  %shr = ashr i32 %add23, %tmp4
+  %arrayidx25 = getelementptr inbounds %struct.B, %struct.B* %call, i64 %idxprom15
+  %ival26 = getelementptr inbounds %struct.B, %struct.B* %arrayidx25, i32 0, i32 0
+  %tmp11 = load i16, i16* %ival26, align 2
+  %conv27 = sext i16 %tmp11 to i32
+  %sub = sub nsw i32 %conv27, %shr
+  %conv28 = trunc i32 %sub to i16
+  store i16 %conv28, i16* %ival26, align 2
+  br label %for.cond13
+
----------------
Why do we need such complicated loop body, if we're basically only interested in gep+gep+load? Also, the control flow in this test is strange, and I'm not sure if it's necessary for the purpose of the test.

Could we simplify it?

http://reviews.llvm.org/D10281

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/