[llvm-dev] Automatic Insertion of OpenACC/OpenMP directives

Tue Jan 3 10:02:18 PST 2017

On 1/3/17 10:19 AM, Fernando Magno Quintao Pereira wrote:
> Dear Jonathan and Mehdi,
>
>      you are right. To extend the ability of the OpenACC compliant
> compiler, we use a technique called restrictification to disambiguate
> pointers at runtime. So, if you use this program below as input in
> http://cuda.dcc.ufmg.br/dawn:
>
> float saxpy(float a, float *x, float *y, int n) {
>   int j = 0;
>   for (int i = 0; i < n; ++i) {
>     y[j] = a*x[i] + y[j];
>     ++j;
>   }
> }

I think I'm still not clearly getting my point across. Let me explain 
with a simpler example. Suppose this were the input:

void ascend1(int *y, int n) {
   int j = 0;
   for (int i = 0; i < n; ++i) {
     y[j] = j;
     j++;
   }
}

One would expect that the entries in the array y would contain: y[0] = 
0, y[1] = 1, y[2] = 2, etc. However, if one were to annotate that with 
(in OpenMP, as I don't know the semantics of OpenACC):

void ascend2(int *y, int n) {
   int j = 0;
   #pragma omp parallel for
   for (int i = 0; i < n; ++i) {
     y[j] = j;
     j++;
   }
}

now there's a race because the value of j at each iteration of the loop 
depends on the previous iteration's value of it. Now compare that to:

void ascend3(int *y, int n) {
   for (int i = 0; i < n; ++i) {
     y[i] = i;
   }
}

this one does not have the loop carried dependence on the variable used 
for indexing, so it is indeed safe to parallelize. Mehdi's comment was 
that the llvm IR for `ascend1()` and `ascend2()` will look almost the 
same (at least after optimization), so you'll need some way to recover 
the information telling you that `ascend1()`s source is *not* 
parallelizable with the annotations, but that `ascend3()` *is*.

Jon
>
>      then the final program that you obtain is this one:
>
> float saxpy(float a, float *x, float *y, int n) {
>    int j = 0;
>    long long int AI1[6];
>    AI1[0] = n + -1;
>    AI1[1] = 4 * AI1[0];
>    AI1[2] = AI1[1] + 4;
>    AI1[3] = AI1[2] / 4;
>    AI1[4] = (AI1[3] > 0);
>    AI1[5] = (AI1[4] ? AI1[3] : 0);
>    char RST_AI1 = 0;
>    RST_AI1 |= !((x + 0 > y + AI1[5]) || (y + 0 > x + AI1[5])); //
> Restrictification test!
>    #pragma acc data pcopyin(x[0:AI1[5]]) pcopy(y[0:AI1[5]]) if(!RST_AI1)
>    #pragma acc kernels if(!RST_AI1) // See the test being used here!
>    for (int i = 0; i < n; ++i) {
>      y[j] = a * x[i] + y[j];
>      ++j;
>    }
> }
>
>      The input kernel, saxpy is, in the absence of aliasing, a map.
> Thus, it is parallelizable (assuming no aliasing between arrays x and
> y). So, how does dawn-cc handle aliasing? Our tool does it via a
> restrictification check. In the code above, this restrictification
> check is encoded in the variable RST_AI1. This variable is the result
> of a test that ensures that the limits of arrays x and y do not
> overlap. LLVM itself uses some form of restrictification to carry out
> vectorization. Our checks are produced after the paper:
>
> Pericles Rafael Alves, Fabian Gruber, Johannes Doerfert, Alexandros
> Labrineas, Tobias Grosser, Fabrice Rastello and Fernando Magno Quintão
> Pereira. Runtime Pointer Disambiguation. OOPSLA'2015.
>
>      In this example, aliasing does not occur as long as &x + n <= &y
> (or vice versa). Dawn-cc can create restrictification checks as long
> as all the involved variables encode affine expressions of induction
> variables.
>
> Regards,
>
> Fernando

-- 
Jon Roelofs
jonathan at codesourcery.com
CodeSourcery / Mentor Embedded