[llvm-dev] Automatic Insertion of OpenACC/OpenMP directives

Mehdi Amini via llvm-dev llvm-dev at lists.llvm.org
Tue Jan 3 09:13:00 PST 2017


> On Jan 3, 2017, at 7:17 AM, Jonathan Roelofs <jonathan at codesourcery.com> wrote:
> 
> 
> 
> On 12/31/16 12:37 PM, Fernando Magno Quintao Pereira via llvm-dev wrote:
>> Dear Mehdi,
>> 
>>    I've changed your example a little bit:
>> 
>> float saxpy(float a, float *x, float *y, int n) {
>> int j = 0;
>> for (int i = 0; i < n; ++i) {
>>   y[j] = a*x[i] + y[I]; // Change 'I' into 'j'?
>>   ++j;
>> }
>> }
>> 
>> I get this code below, once I replace 'I' with 'j'. We are copying n
>> positions of both arrays, 'x' and 'y':
>> 
>> float saxpy(float a, float *x, float *y, int n) {
>>  int j = 0;
>> 
>>  long long int AI1[6];
>>  AI1[0] = n + -1;
>>  AI1[1] = 4 * AI1[0];
>>  AI1[2] = AI1[1] + 4;
>>  AI1[3] = AI1[2] / 4;
>>  AI1[4] = (AI1[3] > 0);
>>  AI1[5] = (AI1[4] ? AI1[3] : 0);
>>  #pragma acc data pcopy(x[0:AI1[5]],y[0:AI1[5]])
>>  #pragma acc kernels
>>  for (int i = 0; i < n; ++i) {
>>    y[j] = a * x[i] + y[j];
>>    ++j;
>>  }
> 
> I'm not familiar with OpenACC, but doesn't this still have a loop carried dependence on j, and therefore isn't correctly parallelizable as written?

That was my original concern as well, but I had forgot that OpenACC pragma are not necessarily saying to the compiler that the loop is parallel:

 #pragma acc kernels

only tells the compiler to “try” to parallelize the loop if it can prove it safe, but:

 #pragma acc parallel kernels

bypasses the compiler checks and force parallelization.

The tool takes care of figuring out the sizes of the array AFAIK (haven’t read the paper yet to understand the novelty in the approach here).

— 
Mehdi

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170103/1da218ba/attachment.html>


More information about the llvm-dev mailing list