[llvm-dev] [LLVM] (RFC) Addition/Support of new Vectorization Pragmas in LLVM

Mon Aug 26 12:16:33 PDT 2019

My intent was to present the expected behavior for an "ivdep" pragma 
implementation, rather than diving into the implementation details - 
that seems like it should be another thread.

That said, trying to predict in the front end what edges will eventually 
cause difficulties with automatic vectorization does seem problematic.  
Generally "ivdep" is an assist to automatic vectorization; for older 
Cray compilers that basically means the front end does nothing but pass 
along the "ivdep" property, and dependency analysis for vectorization 
uses that property directly.

One thing to remember is that is perfectly valid for the "ivdep" loop 
nest to still be rejected as a vector candidate for any reason, so 
support for an "ivdep" pragma could be implemented in stages if desired.

Terry

On 8/19/2019 2:33 PM, Michael Kruse wrote:
> I think some of the semantics could be implemented using the
> "llvm.mem.parallel_loop_access" annotation we already have, modulo the
> difficulties mentioned below.
>
> Am Do., 15. Aug. 2019 um 15:06 Uhr schrieb Terry Greyzck via llvm-dev
> <llvm-dev at lists.llvm.org>:
>>     * Primarily ivdep allows ambiguous dependencies to be ignored, examples:
>>         *  p[i] = q[j]
>>         *  a[ix[i]] = b[iy[i]]
>>         *  a[ix[i]] += 1.0
> "ambiguous dependencies" is very vague. Does it mean the compiler has
> to do some analysis to detect non-ambiguous dependencies?
>
> When using "llvm.mem.parallel_loop_access", this would mean the
> front-end would have to detect which accesses are non-ambiguous and
> not annotate them. However, the annotation is for single accesses, not
> dependencies. Both "p[i]" and "q[j]" look non-ambiguous individually,
> but the vectorizer would have to add a runtime-check and loop
> versioning to ensure that these are not aliasing.
>
>
>>     * ivdep still requires automatic detection of reductions, including
>>       multiple homogeneous reductions on a single variable, examples:
>>         *  x = x + a[i]
>>         *  x = x + a[i]; if ( c[i] > 0.0 ) { x = x + b[i] }
> We could leave away the "llvm.mem.parallel_loop_access" for the
> LoadInst and StoreInst of the reduction variable, assuming detected
> reductions are limited over scalar variables. However, mem2reg/sroa
> would remove those memory accesses anyway, including their annotation,
> requiring the LoopVectorizer to detect that the resulting PHINode is a
> reduction. Mem2reg/sroa/LICM would also do so with non-reductions, and
> array elements that are promoted to registers during the execution of
> the loop, such that the loop would not be vectorizable.
>
>
>
> Michael
>