[PATCH] D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address

Sun Aug 26 09:29:40 PDT 2018

Ayal added a comment.

In https://reviews.llvm.org/D50665#1212872, @anna wrote:

> This is what the langref states for scatter intrinsic (https://llvm.org/docs/LangRef.html#id1792):
>
>   . The data stored in memory is a vector of any integer, floating-point or pointer data type. Each vector element is stored in an arbitrary memory address. Scatter with overlapping addresses is guaranteed to be ordered from least-significant to most-significant element.

Yes, this was intentional, precisely to support vectorization of (possibly) self-overwriting stores.

> ...
>  This is the code that gets generated for uniform stores on skylake with AVX-512 support once I fixed the bug in this patch
>  ...

LGTM.
Indeed, care must be taken to avoid using more than one masked scatter to the same invariant address; but LAA should flag such non-self cross-iteration dependencies.

================
Comment at: include/llvm/Analysis/LoopAccessAnalysis.h:570
+  bool hasVariantStoreToLoopInvariantAddress() const {
+    return VariantStoreToLoopInvariantAddress;
   }
----------------
Could rename `VariantStoreToLoopInvariantAddress` to `HasVariantStoreToLoopInvariantAddress`.

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1871
+      return false;
+    return TheLoop->hasLoopInvariantOperands(cast<Instruction>(StoreVal));
+  };
----------------
anna wrote:
> Ayal wrote:
> > anna wrote:
> > > Ayal wrote:
> > > > anna wrote:
> > > > > Ayal wrote:
> > > > > > Again, something LICM may have missed?
> > > > > yes, LICM misses this as well - see added test case in `inv_val_store_to_inv_address_conditional_inv`. 
> > > > Ahh, but a phi of invariant values is invariant iff the compares that decide which predecessor will reach the phi, are also invariant. In `inv_val_store_to_inv_address_conditional_inv` this holds because there `%cmp` determines which predecessor it'll be, and `%cmp` is invariant. In general Divergence Analysis is designed to provide this answer, as in D50433's `isUniform()`.
> > > yes, that's right. Note that this patch handles phis of invariant values based on either an invariant condition or a variant condition (see `inv_val_store_to_inv_address_conditional_diff_values_ic` where the phi result is based on a varying condition). 
> > > 
> > > The improved codegen and cost model handling is for predicated stores, where the block containing the invariant store is to be predicated. Today, we just handle this as a "predicated store" cost and generate the code gen accordingly. 
> > So the suggested `isLoopInvariantStoreValue` name is incorrect, as the store value may be variant. What's special about these variant values - why not handle any store value?
> > 
> > Yes, conditional stores to an invariant address will end up scalarized and predicated, i.e., with a branch-and-store per lane, which is quite inefficient. A masked scatter may work better there, until optimized by a single branch-and-store if any lane is masked-on (invariant stored value) or single branch-and-store of last masked-on lane (in/variant stored value).
> I am currently adding the support for *any variant* stores to invariant address. The unsafe cross iteration dependencies are identified through `LAA: unsafe dependent memory operations in loop`.  This was identified without any changes required from my side to the LAA memory conflict detection. However, I'm not sure if LAA handles all cases exhaustively. 
> 
> The reason I started with this sub-patch is that when the stored value is not a varying memory access from within the loop (that's what this blob of code is really trying to do - see `isLoopInvariant` and `hasLoopInvariantOperands`), we don't need to reason about whether LAA handles all memory conflict detection.
> 
> When the stored value can be any variant value, we need to make sure that the LAA pass handles all the memory conflicts correctly or update LAA if that isn't the case.
It is conceivable that stores of invariant values to invariant addresses can participate in a subset of unsafe scenarios, which may be easier for LAA to detect, and thus start by treating only stores of invariant values to invariant addresses. But storing a variant phi whose "dominating" compares are not all invariant, could conceptually produce arbitrary variant values and dependencies, despite having invariant values for all other operands of the phi; e.g., 0 and -1. Presumably, this case does not differ, from LAA perspective, from stores of any variant value to invariant address.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:5766
+  bool isLoopInvariantValueStored =
+      TheLoop->isLoopInvariant(SI->getValueOperand());
   return TTI.getAddressComputationCost(ValTy) +
----------------
anna wrote:
> anna wrote:
> > Ayal wrote:
> > > Ayal wrote:
> > > > Should be consistent and use the same `isLoopInvariantStoreValue()` noted above.
> > > This is still inconsistent with the OR-operands-are-invariant above.
> > will update.
> actually, this is correct. We don't need to update it to the "incorrectly named" lambda above. 
> 
> We need to do an extract if the value is not invariant: example case:
> ```
> for.body:                                         ; preds = %for.body, %entry
>   %i = phi i64 [ %i.next, %latch ], [ 0, %entry ]
>   %tmp1 = getelementptr inbounds i32, i32* %b, i64 %i
>   %tmp2 = load i32, i32* %tmp1, align 8
>   %varying_cmp = icmp eq i32 %tmp2, %k
>   store i32 %ntrunc, i32* %tmp1
>   br i1 %varying_cmp, label %cond_store, label %cond_store_k
> 
> cond_store:
>   br label %latch
> 
> cond_store_k:
>   br label %latch
> 
> latch:
>   %storeval = phi i32 [ %ntrunc, %cond_store ], [ %k, %cond_store_k ]
>   store i32 %storeval, i32* %a <-- uniform store
> ```
> storeval's operands are invariant, but the value being chosen in each iteration of the loop varies based on `%varying_cmp`. In this case, we need an extract and then the scalar store. That's exactly what we do as well. 
Agreed. Misled by the erroneous isLoopInvariantStoreValue() name.

Repository:
  rL LLVM

https://reviews.llvm.org/D50665