[llvm-dev] Automatic Insertion of OpenACC/OpenMP directives
Jonathan Roelofs via llvm-dev
llvm-dev at lists.llvm.org
Tue Jan 3 10:02:18 PST 2017
On 1/3/17 10:19 AM, Fernando Magno Quintao Pereira wrote:
> Dear Jonathan and Mehdi,
>
> you are right. To extend the ability of the OpenACC compliant
> compiler, we use a technique called restrictification to disambiguate
> pointers at runtime. So, if you use this program below as input in
> http://cuda.dcc.ufmg.br/dawn:
>
> float saxpy(float a, float *x, float *y, int n) {
> int j = 0;
> for (int i = 0; i < n; ++i) {
> y[j] = a*x[i] + y[j];
> ++j;
> }
> }
I think I'm still not clearly getting my point across. Let me explain
with a simpler example. Suppose this were the input:
void ascend1(int *y, int n) {
int j = 0;
for (int i = 0; i < n; ++i) {
y[j] = j;
j++;
}
}
One would expect that the entries in the array y would contain: y[0] =
0, y[1] = 1, y[2] = 2, etc. However, if one were to annotate that with
(in OpenMP, as I don't know the semantics of OpenACC):
void ascend2(int *y, int n) {
int j = 0;
#pragma omp parallel for
for (int i = 0; i < n; ++i) {
y[j] = j;
j++;
}
}
now there's a race because the value of j at each iteration of the loop
depends on the previous iteration's value of it. Now compare that to:
void ascend3(int *y, int n) {
for (int i = 0; i < n; ++i) {
y[i] = i;
}
}
this one does not have the loop carried dependence on the variable used
for indexing, so it is indeed safe to parallelize. Mehdi's comment was
that the llvm IR for `ascend1()` and `ascend2()` will look almost the
same (at least after optimization), so you'll need some way to recover
the information telling you that `ascend1()`s source is *not*
parallelizable with the annotations, but that `ascend3()` *is*.
Jon
>
> then the final program that you obtain is this one:
>
> float saxpy(float a, float *x, float *y, int n) {
> int j = 0;
> long long int AI1[6];
> AI1[0] = n + -1;
> AI1[1] = 4 * AI1[0];
> AI1[2] = AI1[1] + 4;
> AI1[3] = AI1[2] / 4;
> AI1[4] = (AI1[3] > 0);
> AI1[5] = (AI1[4] ? AI1[3] : 0);
> char RST_AI1 = 0;
> RST_AI1 |= !((x + 0 > y + AI1[5]) || (y + 0 > x + AI1[5])); //
> Restrictification test!
> #pragma acc data pcopyin(x[0:AI1[5]]) pcopy(y[0:AI1[5]]) if(!RST_AI1)
> #pragma acc kernels if(!RST_AI1) // See the test being used here!
> for (int i = 0; i < n; ++i) {
> y[j] = a * x[i] + y[j];
> ++j;
> }
> }
>
> The input kernel, saxpy is, in the absence of aliasing, a map.
> Thus, it is parallelizable (assuming no aliasing between arrays x and
> y). So, how does dawn-cc handle aliasing? Our tool does it via a
> restrictification check. In the code above, this restrictification
> check is encoded in the variable RST_AI1. This variable is the result
> of a test that ensures that the limits of arrays x and y do not
> overlap. LLVM itself uses some form of restrictification to carry out
> vectorization. Our checks are produced after the paper:
>
> Pericles Rafael Alves, Fabian Gruber, Johannes Doerfert, Alexandros
> Labrineas, Tobias Grosser, Fabrice Rastello and Fernando Magno Quintão
> Pereira. Runtime Pointer Disambiguation. OOPSLA'2015.
>
> In this example, aliasing does not occur as long as &x + n <= &y
> (or vice versa). Dawn-cc can create restrictification checks as long
> as all the involved variables encode affine expressions of induction
> variables.
>
> Regards,
>
> Fernando
--
Jon Roelofs
jonathan at codesourcery.com
CodeSourcery / Mentor Embedded
More information about the llvm-dev
mailing list