[PATCH] D69897: Add #pragma clang loop vectorize_assume_alignment(n)

Wed Nov 20 09:27:27 PST 2019

Am Mi., 20. Nov. 2019 um 10:21 Uhr schrieb HAPPY Mahto
<cs17btech11018 at iith.ac.in>:
>> #pragma clang loop vectorize_assume_alignment(32)
>> for(int i = 0;i < n; i++){
>> a[i] = b[i] + i*i;
>> }
>
>  for this all-access inside the loop will be aligned to 32bit,
> ex  IR
>>
>> for.cond:                                         ; preds = %for.inc, %entry
>>   %5 = load i32, i32* %i, align 32, !llvm.access.group !2
>>   %6 = load i32, i32* %n, align 32, !llvm.access.group !2
>>   %cmp = icmp slt i32 %5, %6
>>   br i1 %cmp, label %for.body, label %for.end
>>
>> for.body:                                         ; preds = %for.cond
>>   %7 = load i32, i32* %i, align 32, !llvm.access.group !2
>>   %8 = load i32, i32* %i, align 32, !llvm.access.group !2
>>   %idxprom = sext i32 %8 to i64
>>   %arrayidx = getelementptr inbounds i32, i32* %vla1, i64 %idxprom
>>   store i32 %7, i32* %arrayidx, align 32, !llvm.access.group !2
>>   br label %for.inc
>>
>> for.inc:                                          ; preds = %for.body
>>   %9 = load i32, i32* %i, align 32, !llvm.access.group !2
>>   %inc = add nsw i32 %9, 1
>>   store i32 %inc, i32* %i, align 32, !llvm.access.group !2
>>   br label %for.cond, !llvm.loop !3
>
> You will not need to create pointers for every array(or operand you want to perform the operation on).

IMHO it is better if the programmer has to. It is not always obvious
which arrays are used in the loop. Also, the information can be used
by other optimzations that the vectorizer.

>>
>> void mult(float* x, int size, float factor){
>>   float* ax = (float*)__builtin_assume_aligned(x, 64);
>>   for (int i = 0; i < size; ++i)
>>      ax[i] *= factor;
>> }

https://godbolt.org/z/Fd6HMe

> the alignment is assumed whereas in #pragma it is set to the number specified.

Semantically, it is the same.

I wonder how you expect the assembly output to change? The
__builtin_assume_aligned, will be picked up by the backend and result
in movaps to be used instead of movups.

> it'll be easier, and having a pragma for doing this will help as it's provided in OMP and intel compilers.

This is a compiler-specific extension. It does not have an influence
on what other compilers do. Even with clang, if you try to do

#pragma clang loop vectorize_assume_alignment(32)
#pragma omp simd
for (int i = 0; i < size; ++i)

clang will silently swallow the vectorize_assume_alignment.

Michael