[PATCH] D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address

Fri Aug 24 10:31:35 PDT 2018

hsaito added a comment.

In https://reviews.llvm.org/D50665#1212597, @anna wrote:

> One more interesting thing I noticed while adding predicated invariant stores to X86 (for -mcpu=skylake-avx512), it supports masked scatter for non-unniform stores.
>  But we need to add support for uniform stores along with this patch. Today, it just generates incorrect code (no predication whatsover). 
>  For other architectures that do not have these masked intrinsics, we just generate the predicated store by doing an extract and branch on each lane (correct but inefficient and will be avoided unless -force-vector-width=X).

In general, self output dependence is fine to vectorize (whether the store address is uniform or random), as long as (masked) scatter (or scatter emulation) happens from lower elements to higher elements. Intel's scatter instruction is implemented in that way, and so is CG Prepare's serialization of masked scatter intrinsic. When we check for TTI based availability/cost, we need to ensure that the HW scatter support satisfies this ordering requirement since some scatter implementations may not.

Repository:
  rL LLVM

https://reviews.llvm.org/D50665