[PATCH] D69897: Add #pragma clang loop vectorize_assume_alignment(n)

Fri Nov 22 17:11:11 PST 2019

Hello Michael, A very good Morning to you.

On Wed, Nov 20, 2019 at 10:58 PM Michael Kruse <llvm at meinersbur.de> wrote:

> Am Mi., 20. Nov. 2019 um 10:21 Uhr schrieb HAPPY Mahto
> <cs17btech11018 at iith.ac.in>:
> >> #pragma clang loop vectorize_assume_alignment(32)
> >> for(int i = 0;i < n; i++){
> >> a[i] = b[i] + i*i;
> >> }
> >
> >  for this all-access inside the loop will be aligned to 32bit,
> > ex  IR
> >>
> >> for.cond:                                         ; preds = %for.inc,
> %entry
> >>   %5 = load i32, i32* %i, align 32, !llvm.access.group !2
> >>   %6 = load i32, i32* %n, align 32, !llvm.access.group !2
> >>   %cmp = icmp slt i32 %5, %6
> >>   br i1 %cmp, label %for.body, label %for.end
> >>
> >> for.body:                                         ; preds = %for.cond
> >>   %7 = load i32, i32* %i, align 32, !llvm.access.group !2
> >>   %8 = load i32, i32* %i, align 32, !llvm.access.group !2
> >>   %idxprom = sext i32 %8 to i64
> >>   %arrayidx = getelementptr inbounds i32, i32* %vla1, i64 %idxprom
> >>   store i32 %7, i32* %arrayidx, align 32, !llvm.access.group !2
> >>   br label %for.inc
> >>
> >> for.inc:                                          ; preds = %for.body
> >>   %9 = load i32, i32* %i, align 32, !llvm.access.group !2
> >>   %inc = add nsw i32 %9, 1
> >>   store i32 %inc, i32* %i, align 32, !llvm.access.group !2
> >>   br label %for.cond, !llvm.loop !3
> >
> > You will not need to create pointers for every array(or operand you want
> to perform the operation on).
>
> IMHO it is better if the programmer has to. It is not always obvious
> which arrays are used in the loop. Also, the information can be used
> by other optimzations that the vectorizer.
>
>  We wrote this pragma in by keeping in mind that arrays inside the
important loops are used for vectorization and it'll be good for them to be
in aligned manner.
for( int i = 0; i < n; i++){
  a[i] = b[i] + c[i];
  d[i] = e[i] + i*i;
}
If there are more than 4-5 arrays inside loop being used then it'll be
extra effort to define all of them as _builtin_assume_aligned, we're
thinking of letting that thing handled by pragma itself. It'll be very
helpful for us if other people from community can give their views on this.

> >>
> >> void mult(float* x, int size, float factor){
> >>   float* ax = (float*)__builtin_assume_aligned(x, 64);
> >>   for (int i = 0; i < size; ++i)
> >>      ax[i] *= factor;
> >> }
>
> https://godbolt.org/z/Fd6HMe
>
> > the alignment is assumed whereas in #pragma it is set to the number
> specified.
>
> Semantically, it is the same.
>
> I wonder how you expect the assembly output to change? The
> __builtin_assume_aligned, will be picked up by the backend and result
> in movaps to be used instead of movups.
>
>

> > it'll be easier, and having a pragma for doing this will help as it's
> provided in OMP and intel compilers.
>
> This is a compiler-specific extension. It does not have an influence
> on what other compilers do. Even with clang, if you try to do
>
> #pragma clang loop vectorize_assume_alignment(32)
> #pragma omp simd
> for (int i = 0; i < size; ++i)
>
> clang will silently swallow the vectorize_assume_alignment.
>
>
> Michael
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20191123/4e719f52/attachment.html>