[PATCH] D44338: [LV][VPlan] Build plain CFG with simple recipes for outer loops.

Fri Mar 16 13:02:50 PDT 2018

hsaito added inline comments.

================
Comment at: lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp:100
+    if (isa<LoadInst>(Inst) || isa<StoreInst>(Inst)) {
+      VPBB->appendRecipe(
+          new VPWidenMemoryInstructionRecipe(Inst, nullptr /*Mask*/));
----------------
a.elovikov wrote:
> For outer loop vectorization in
> 
>     int s = 0;
>     for (int i = 0; i < N; ++i) {
>       for (int j = 0; j < M; ++j) {
>         s += x[i] * y[j];
>       }
>     }
> 
> We need a broadcast y[j] -> {y[j], y[j], y[j], y[j]} but this will generate a WIDEN recipe for the load. Is that OK? If so, can we document it somewhere?
> 
Reference:   LoopVectorizationPlanner::tryToWidenMemory().

VPWidenMemoryRecipe can handle CM_GatherScatter and uniform can be thought of as a special form of gather/scatter. From that perspective, it is okay.

A vector load/store is deemed gather/scatter until analysis improves it to a better access type. From that perspective, using "generic gather/scatter" during the initial VPlan construction phase makes perfect sense. 

If we are building a single VPlan CFG for inner and/or outer loop vectorization (and that's something we should be doing if HCFG look identical), we can't encode "memory access kind" information within HCFG. So, keeping it in "generic gather/scatter" at HCFG level is the right thing to do for the long term also.

In other words, we need a storage outside of HCFG to house "uniform/unit-stride/interleave/..." information for the load/store.

Repository:
  rL LLVM

https://reviews.llvm.org/D44338