[PATCH] Extend LoopVectorizationLegality::isConsecutivePtr to handle multiple level GEPs

Michael Zolotukhin mzolotukhin at apple.com
Thu Jun 18 18:16:49 PDT 2015


Hi Wei,

Thanks for your work! Please find more comments below:

> Originally there are two cases where a load/store is consecutive:

>  case1. The pointer operand of gep is a phi and it is an induction variable.

>  case2. The pointer operand is invariant, only one index operand of the gep is a loop induction variable and all the other index operands on the right hand side of the variant index operand are all 0.

> 

> The one more case (case 3) added in the patch is when the pointer operand of gep (named as gep_a) is another gep (named as gep_b). For such load/store to be consecutive, all the index operands of gep_a are all 0 , and gep_b should be case 1 or 2 or another recurisive gep.


These all are details that should be covered by SCEV. That is, once you use SCEV for such analysis, you no longer need to bother about whether the pointer operand is PHI, GEP, or something else. And, you don't need to specifically handle the cases like `%gep2 = getelementptr %gep1, i64 0, i64 0` since in terms of SCEV they should give you the same SCEV expression.

Thus, I expect that this patch should make a lot of code in isConsecutivePtr redundant - probably the only code we need there is the one you are about to add. For instance, I suspect that we won't need these checks:

  // We can emit wide load/stores only if the last non-zero index is the
  // induction variable.
  ...

...if we use SCEV properly.

> I keep -instcombine because it makes the generated IR a lot simpler and may be helpful for checking the output.


I'm not convinced here. Please take a look at, for instance, `test/Transforms/LoopVectorize/X86/powof2div.ll`. In order to figure out whether the vectorization takes place, we need to just look for a vector type (like `<4 x i32>`) in the output. `-inst-combine` is unnecessary for this. Also, I'm pretty sure that the test can and should be reduced further - we don't need so many basic blocks and instructions to test this feature (again, you could take a look at `powof2div.ll` as an example).


REPOSITORY
  rL LLVM

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1585
@@ +1584,3 @@
+  // equals to the size of pointer element type, we know it is consecutive.
+  if (PtrScev && (PtrAddRec = dyn_cast<SCEVAddRecExpr>(PtrScev))) {
+    const SCEV *Step = PtrAddRec->getStepRecurrence(*SE);
----------------
Please use `dyn_cast_or_null` here. It'll be much more compact.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1586-1588
@@ +1585,5 @@
+  if (PtrScev && (PtrAddRec = dyn_cast<SCEVAddRecExpr>(PtrScev))) {
+    const SCEV *Step = PtrAddRec->getStepRecurrence(*SE);
+    const SCEVConstant *C = dyn_cast<SCEVConstant>(Step);
+    if (C) {
+      const APInt &APStepVal = C->getValue()->getValue();
----------------
I'd rewrite this as
```
if (auto C = dyn_cast_or_null<SCEVConstant>(PtrAddRec->getStepRecurrence(*SE))) {
```

================
Comment at: test/Transforms/LoopVectorize/pr23580.ll:27
@@ +26,3 @@
+for.preheader:                                    ; preds = %entry
+  %call = call %struct.B* @_ZN1C5m_fn1Ev()
+  %tmp4 = load i32, i32* @a, align 4
----------------
Why do we need this call? If we just need some external variable, I'd rather replace it with function argument. That'll hopefully allow us to reduce the test even further.

http://reviews.llvm.org/D10281

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/






More information about the llvm-commits mailing list