<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Jan 3, 2017, at 7:17 AM, Jonathan Roelofs <<a href="mailto:jonathan@codesourcery.com" class="">jonathan@codesourcery.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">On 12/31/16 12:37 PM, Fernando Magno Quintao Pereira via llvm-dev wrote:</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;" class="">Dear Mehdi,<br class=""><br class=""> I've changed your example a little bit:<br class=""><br class="">float saxpy(float a, float *x, float *y, int n) {<br class="">int j = 0;<br class="">for (int i = 0; i < n; ++i) {<br class=""> y[j] = a*x[i] + y[I]; // Change 'I' into 'j'?<br class=""> ++j;<br class="">}<br class="">}<br class=""><br class="">I get this code below, once I replace 'I' with 'j'. We are copying n<br class="">positions of both arrays, 'x' and 'y':<br class=""><br class="">float saxpy(float a, float *x, float *y, int n) {<br class=""> int j = 0;<br class=""><br class=""> long long int AI1[6];<br class=""> AI1[0] = n + -1;<br class=""> AI1[1] = 4 * AI1[0];<br class=""> AI1[2] = AI1[1] + 4;<br class=""> AI1[3] = AI1[2] / 4;<br class=""> AI1[4] = (AI1[3] > 0);<br class=""> AI1[5] = (AI1[4] ? AI1[3] : 0);<br class=""> #pragma acc data pcopy(x[0:AI1[5]],y[0:AI1[5]])<br class=""> #pragma acc kernels<br class=""> for (int i = 0; i < n; ++i) {<br class=""> y[j] = a * x[i] + y[j];<br class=""> ++j;<br class=""> }<br class=""></blockquote><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">I'm not familiar with OpenACC, but doesn't this still have a loop carried dependence on j, and therefore isn't correctly parallelizable as written?</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""></div></blockquote><div><br class=""></div><div>That was my original concern as well, but I had forgot that OpenACC pragma are not necessarily saying to the compiler that the loop is parallel:</div><div><br class=""></div><div> #pragma acc kernels</div><div><br class=""></div><div>only tells the compiler to “try” to parallelize the loop if it can prove it safe, but:</div><div><br class=""></div><div> #pragma acc parallel kernels</div><div><br class=""></div><div>bypasses the compiler checks and force parallelization.</div><div><br class=""></div><div>The tool takes care of figuring out the sizes of the array AFAIK (haven’t read the paper yet to understand the novelty in the approach here).</div><div><br class=""></div><div>— </div><div>Mehdi</div><div><br class=""></div></div></body></html>