[PATCH] D42447: [LV][VPlan] Detect outer loops for explicit vectorization.

Thu Mar 8 12:25:58 PST 2018

dcaballe marked 3 inline comments as done.
dcaballe added a comment.

Thanks for the comments, Renato!
Please, have a look at my inline comments and let me know what you think.

Thanks!
Diego

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:2319
+// LoopVectorizeHints). This will be fixed in the future when the native IR
+// representation for pragma 'omp simd' is introduced.
+static bool isExplicitVecOuterLoop(Loop *OuterLp,
----------------
rengolin wrote:
> Sorry, this is my fault, as both were done separately. We discussed adding one more metadata info to mean {"forced"/"hint"} but ended up never doing it. It should be simple to fix this, I think, just need to make sure we change all the tests correctly to what they're supposed to mean.
No problem. As I mention, there are going to be changes regarding the representation of #pragma omp simd. I think that's the right time to address the problem.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:2363
       V.push_back(&L);
-    return;
+      // TODO: Collect inner loops inside marked outer loops in case
+      // vectorization fails for the outer loop. Do not invoke
----------------
rengolin wrote:
> What's the complexity of this analysis? We'll be adding a lot of those, repeatedly, no?
> 
> If I understand correctly, `containsIrreducibleCFG` is not that simple, in addition to the traversal. Not calling it unnecessarily would be a nice thing to have up front.
> 
> How complex would it be to create the inherited attribute?
> What's the complexity of this analysis? We'll be adding a lot of those, repeatedly, no? 

Sorry, I'm not sure I understand the question. If you mean the complexity of collecting nested loops given an outer loop, it would be linear on the number of nested loops. We wouldn't add the same loop multiple times. Each loop in the loop nest would be added and processed just once.

> If I understand correctly, containsIrreducibleCFG is not that simple, in addition to the traversal. Not calling it unnecessarily would be a nice thing to have up front. How complex would it be to create the inherited attribute?

Adding the attribute wouldn't be complicated. We would have a recursive call that processes loops from outer to inner. Once we know that the CFG of the outer loop is reducible, we can pass a flag through the recursive call to mark that nested loops are reducible and skip `containsIrreducibleCFG` for them. Does this make sense to you?

I had it implemented but there is a subtle detail that would make this patch non-NFC. If you look at line 8901, collected loops are processed in reverse order. This basically means that if we have:

```
#pragma clang loop vectorize
for i
  for j

  for k
}
```

the current code would process loops k and j first. If one of them is vectorized, we couldn't vectorize 'i', the one marked for vectorization. I see two potential solutions:
1) Reverse the order in which loops are processed in line 8901. This is non-NFC and some existing LIT tests would have to be updated accordingly.
2) Collect loops at different loop nest levels in post-order and loops at the same level in pre-order. This would be the collection order for the previous example: j, k, i. This would be NFC for inner loops but I find it particularly weird.

IMO, option 1 seems the right approach but it's non-NFC and I wouldn't include it as part of this patch.
What do you think?

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:7642
+  if (!OrigLoop->empty()) {
+    assert(EnableVPlanNativePath && "VPlan-native path is not enabled.");
+    assert(UserVF && "Expected UserVF for outer loop vectorization.");
----------------
rengolin wrote:
> I wouldn't make this an assert, just a debug message and return
I think this should be an assert because an outer loop shouldn't reach this point if the  VPlan-native patch is disabled. However, I'm going to add the debug message in the caller code. Does it sound good?

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:7650
+    // TODO: VPlan-based Predication.
+    // TODO: VPlan-based Cost Modeling.
+
----------------
rengolin wrote:
> is this really the place to do predication and cost modelling?
This function returns the VF to be used during code generation so we would need to evaluate the cost here to return the selected VF. Cost modeling shouldn't be part of the VPlan building process. The same approach is followed in the code below. Regarding predication, it must happen before the cost evaluation. We can discuss if it should belong here or not when we introduce the actual code. If the comment is confusing, I can remove it. We decided to introduce the TODOs to give a better picture of the subsequent patches but if this is not helpful or annoying I can just get rid of all of them. Please, let me know what you think.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:8227
+    // VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan.get());
+    // HCFGBuilder.buildHierarchicalCFG();
+    DEBUG(dbgs() << "LV: TODO: Build hierarchical CFG for outer loop.\n");
----------------
rengolin wrote:
> Isn't `containsIrreducibleCFG` depending on this to work?
They are different things. `containsIrreducibleCFG` is used at the very beginning of the pass to collect potential loop candidates (without irreducible CFG) to be vectorized. `VPlanHCFGBuilder` will build a CFG out of the input IR, using the VPlan infrastructure (VPBlockBases). This VPlan CFG will be modified during the vectorization process without actually modifying the CFG of the input IR. Changes in the VPlan CFG will be materialized once the best profitable VPlan is chosen. Is it clearer now?

The VPlanHCFGBuilder is going to be introduced in the next patch.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:8642
+
+     // TODO: It will be set to true when we introduce codegen support.
+     return false;
----------------
rengolin wrote:
> I'm really uncomfortable with all these temporary code blocks that don't do anything...
> 
> They're really just hijacking the existing infrastructure without implementing as a VPlan.
> 
> I really thought the whole point of VPlans was that you wouldn't need to hack-it-up like we used to in the old vectoriser...
This is the entrance to the VPlan-native vectorization path. It's not doing anything yet because we are trying to follow an incremental approach by releasing relatively small patches that are easy to digest. This code will be functional (generating vector code) soon.

The code block is temporary as long as both vectorization paths co-exist but the final goal is to converge into a single one. This approach will allow us to incrementally and easily extend all the current inner loop vectorization functionality to support outer loops and, most importantly, doing so without destabilizing inner loop vectorization. We are really concerned about the latter and we think that this approach is a reasonable trade-off between safety and temporary code blocks.

If you want to discuss this further, I would recommend to move the discussion to the RFC thread so that everybody is aware of it: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html

https://reviews.llvm.org/D42447