<div dir="ltr"><div><div>Oh, I see. Yes, this works:<br><br><span style="font-family:monospace,monospace">__declspec(noalias) <br>void f1( double c[restrict DIM][DIM], <br> const double a[restrict DIM][DIM], <br> const double b[restrict DIM][DIM] )<br>{<br><br>#pragma clang loop unroll_count(UNROLL_DIM)<br> for( int i=0;i<DIM;i++)<br><br>#pragma clang loop unroll_count(UNROLL_DIM)<br> for( int j=0;j<DIM;j++)<br><br>#pragma clang loop unroll_count(UNROLL_DIM)<br> for( int k=0;k<DIM;k++) {<br> c[i][k] = c[i][k] + a[i][j]*b[j][k];<br> }<br>}<br><br></span></div><span style="font-family:monospace,monospace"><font face="arial,helvetica,sans-serif">...works as in the invariants are optimized. Thanks.<br><br></font></span></div><span style="font-family:monospace,monospace"><font face="arial,helvetica,sans-serif">Phil<br></font></span><div><div><br></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Nov 18, 2016 at 3:29 PM, Hal Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Phil,<br>
<br>
I'm not sure whether we do anything with __declspec(noalias), but if I had to guess, when you used restrict, you did not do it correctly. You can see <a href="http://en.cppreference.com/w/c/language/restrict" rel="noreferrer" target="_blank">http://en.cppreference.com/w/<wbr>c/language/restrict</a> for some additional usage examples.<br>
<br>
-Hal<br>
<span class=""><br>
----- Original Message -----<br>
> From: "Phil Tomson via llvm-dev" <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>><br>
> To: "Ashutosh Nema" <<a href="mailto:Ashutosh.Nema@amd.com">Ashutosh.Nema@amd.com</a>><br>
> Cc: "llvm-dev" <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>><br>
> Sent: Friday, November 18, 2016 12:00:58 PM<br>
> Subject: Re: [llvm-dev] Loop invariant not being optimized<br>
><br>
><br>
><br>
><br>
><br>
> I tried changing 'noalias' to 'restrict' in the code and I get:<br>
><br>
> fma.c:17:12: warning: 'restrict' attribute only applies to return<br>
> values that are pointers<br>
><br>
> It seems like 'noalias' would be the correct attribute here, from the<br>
> article you linked:<br>
><br>
</span>> "if a function is annotated as noalias , the optimizer can assume<br>
<div><div class="h5">> that, in addition to the parameters themselves, only first-level<br>
> indirections of pointer parameters are referenced or modified inside<br>
> the function. The visible global state is the set of all data that<br>
> is not defined or referenced outside of the compilation scope, and<br>
> their address is not taken."<br>
><br>
> Phil<br>
><br>
><br>
><br>
><br>
><br>
> On Thu, Nov 17, 2016 at 9:50 PM, Nema, Ashutosh <<br>
> <a href="mailto:Ashutosh.Nema@amd.com">Ashutosh.Nema@amd.com</a> > wrote:<br>
><br>
><br>
><br>
><br>
><br>
><br>
> If I understood it correctly, __declspec(noalias) is not the same as<br>
> specifying restrict on each parameter.<br>
><br>
><br>
><br>
> It means in the mentioned example a, b & c don't modify or reference<br>
> any global state, but they are free to alias one another.<br>
><br>
><br>
><br>
> You could specify restrict on each one to indicate that they do not<br>
> alias each other.<br>
><br>
><br>
><br>
> For more details refer:<br>
> <a href="https://msdn.microsoft.com/en-us/library/k649tyc7.aspx" rel="noreferrer" target="_blank">https://msdn.microsoft.com/en-<wbr>us/library/k649tyc7.aspx</a><br>
><br>
><br>
><br>
> Regards,<br>
><br>
> Ashutosh<br>
><br>
><br>
><br>
><br>
><br>
><br>
> From: llvm-dev [mailto: <a href="mailto:llvm-dev-bounces@lists.llvm.org">llvm-dev-bounces@lists.llvm.<wbr>org</a> ] On Behalf<br>
> Of Phil Tomson via llvm-dev<br>
> Sent: Friday, November 18, 2016 12:23 AM<br>
> To: LLVM Developers Mailing List < <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a> ><br>
> Subject: [llvm-dev] Loop invariant not being optimized<br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
> I've got an example where I think that there should be some<br>
> loop-invariant optimization happening, but it's not. Here's the C<br>
> code:<br>
><br>
> #define DIM 8<br>
> #define UNROLL_DIM DIM<br>
> typedef double InArray[DIM][DIM];<br>
><br>
> __declspec(noalias) void f1( InArray c, const InArray a, const<br>
> InArray b )<br>
> {<br>
><br>
> #pragma clang loop unroll_count(UNROLL_DIM)<br>
> for( int i=0;i<DIM;i++)<br>
> #pragma clang loop unroll_count(UNROLL_DIM)<br>
> for( int j=0;j<DIM;j++)<br>
> #pragma clang loop unroll_count(UNROLL_DIM)<br>
> for( int k=0;k<DIM;k++) {<br>
> c[i][k] = c[i][k] + a[i][j]*b[j][k];<br>
> }<br>
> }<br>
><br>
> The "a[i][j]" there is invariant in that inner loop. I've unrolled<br>
> the loops with the unroll pragma to make the assembly easier to<br>
> read, here's what I see (LVM 3.9, compiling with: clang<br>
> -fms-compatibility -funroll-loops -O3 -c fma.c -o fma.o )<br>
><br>
><br>
> 0000000000000000 <f1>:<br>
> 0: 29580c0000000000 load r3,r0,0x0,64<br>
> 8: 2958100200000000 load r4,r1,0x0,64 #r4 <- a[0][0]<br>
> 10: 2958140400000000 load r5,r2,0x0,64<br>
> 18: c0580c0805018000 fmaf r3,r4,r5,r3,64<br>
> 20: 79b80c0000000000 store r3,r0,0x0,64<br>
> 28: 2958100000000008 load r4,r0,0x8,64<br>
> 30: 2958140200000000 load r5,r1,0x0,64 #r5 <- a[0][0]<br>
> 38: 2958180400000008 load r6,r2,0x8,64<br>
> 40: c058100a06020000 fmaf r4,r5,r6,r4,64<br>
> 48: 79b8100000000008 store r4,r0,0x8,64<br>
> 50: 2958140000000010 load r5,r0,0x10,64<br>
> 58: 2958180200000000 load r6,r1,0x0,64 #r6 <- a[0][0]<br>
> 60: 29581c0400000010 load r7,r2,0x10,64<br>
> 68: c058140c07028000 fmaf r5,r6,r7,r5,64<br>
> 70: 79b8140000000010 store r5,r0,0x10,64<br>
> 78: 2958180000000018 load r6,r0,0x18,64<br>
> 80: 29581c0200000000 load r7,r1,0x0,64 #r7 <- a[0][0]<br>
> 88: 2958200400000018 load r8,r2,0x18,64<br>
> 90: c058180e08030000 fmaf r6,r7,r8,r6,64<br>
> ...<br>
><br>
> (fmaf semantics are: fmaf r1,r2,r3,r4, SIZE r1 <- r2*r3+r4 )<br>
><br>
><br>
> (load semantics are: load r1,r2,imm, SIZE r1<- mem[r2+imm] )<br>
><br>
><br>
><br>
> All three of the addresses are loaded in every loop. Only two need to<br>
> be reloaded in the inner loop. I added the 'noalias' declspec in the<br>
> C code above thinking that it would indicate that the pointers going<br>
> into the function are not aliased and that that would allow the<br>
> optimization, but it didn't make any difference.<br>
><br>
> Of course it's easy to rewrite the example code to avoid this extra<br>
> load/inner loop, but I would have thought this would be a fairly<br>
> straighforward optimization for the optimizer. Am I missing<br>
> something?<br>
><br>
> Phil<br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
</div></div>> ______________________________<wbr>_________________<br>
> LLVM Developers mailing list<br>
> <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>
> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
><br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
Hal Finkel<br>
Lead, Compiler Technology and Programming Languages<br>
Leadership Computing Facility<br>
Argonne National Laboratory<br>
</font></span></blockquote></div><br></div>