[PATCH] D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address

Thu Aug 23 12:32:35 PDT 2018

anna added inline comments.

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1871
+      return false;
+    return TheLoop->hasLoopInvariantOperands(cast<Instruction>(StoreVal));
+  };
----------------
Ayal wrote:
> anna wrote:
> > Ayal wrote:
> > > anna wrote:
> > > > Ayal wrote:
> > > > > Again, something LICM may have missed?
> > > > yes, LICM misses this as well - see added test case in `inv_val_store_to_inv_address_conditional_inv`. 
> > > Ahh, but a phi of invariant values is invariant iff the compares that decide which predecessor will reach the phi, are also invariant. In `inv_val_store_to_inv_address_conditional_inv` this holds because there `%cmp` determines which predecessor it'll be, and `%cmp` is invariant. In general Divergence Analysis is designed to provide this answer, as in D50433's `isUniform()`.
> > yes, that's right. Note that this patch handles phis of invariant values based on either an invariant condition or a variant condition (see `inv_val_store_to_inv_address_conditional_diff_values_ic` where the phi result is based on a varying condition). 
> > 
> > The improved codegen and cost model handling is for predicated stores, where the block containing the invariant store is to be predicated. Today, we just handle this as a "predicated store" cost and generate the code gen accordingly. 
> So the suggested `isLoopInvariantStoreValue` name is incorrect, as the store value may be variant. What's special about these variant values - why not handle any store value?
> 
> Yes, conditional stores to an invariant address will end up scalarized and predicated, i.e., with a branch-and-store per lane, which is quite inefficient. A masked scatter may work better there, until optimized by a single branch-and-store if any lane is masked-on (invariant stored value) or single branch-and-store of last masked-on lane (in/variant stored value).
I am currently adding the support for *any variant* stores to invariant address. The unsafe cross iteration dependencies are identified through `LAA: unsafe dependent memory operations in loop`.  This was identified without any changes required from my side to the LAA memory conflict detection. However, I'm not sure if LAA handles all cases exhaustively. 

The reason I started with this sub-patch is that when the stored value is not a varying memory access from within the loop (that's what this blob of code is really trying to do - see `isLoopInvariant` and `hasLoopInvariantOperands`), we don't need to reason about whether LAA handles all memory conflict detection.

When the stored value can be any variant value, we need to make sure that the LAA pass handles all the memory conflicts correctly or update LAA if that isn't the case.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:5766
+  bool isLoopInvariantValueStored =
+      TheLoop->isLoopInvariant(SI->getValueOperand());
   return TTI.getAddressComputationCost(ValTy) +
----------------
Ayal wrote:
> Ayal wrote:
> > Should be consistent and use the same `isLoopInvariantStoreValue()` noted above.
> This is still inconsistent with the OR-operands-are-invariant above.
will update.

Repository:
  rL LLVM

https://reviews.llvm.org/D50665