[llvm-dev] Loop invariant not being optimized

Fri Nov 18 15:29:06 PST 2016

Hi Phil,

I'm not sure whether we do anything with __declspec(noalias), but if I had to guess, when you used restrict, you did not do it correctly. You can see http://en.cppreference.com/w/c/language/restrict for some additional usage examples.

 -Hal

----- Original Message -----
> From: "Phil Tomson via llvm-dev" <llvm-dev at lists.llvm.org>
> To: "Ashutosh Nema" <Ashutosh.Nema at amd.com>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Friday, November 18, 2016 12:00:58 PM
> Subject: Re: [llvm-dev] Loop invariant not being optimized
> 
> 
> 
> 
> 
> I tried changing 'noalias' to 'restrict' in the code and I get:
> 
> fma.c:17:12: warning: 'restrict' attribute only applies to return
> values that are pointers
> 
> It seems like 'noalias' would be the correct attribute here, from the
> article you linked:
> 
> "if a function is annotated as noalias , the optimizer can assume
> that, in addition to the parameters themselves, only first-level
> indirections of pointer parameters are referenced or modified inside
> the function. The visible global state is the set of all data that
> is not defined or referenced outside of the compilation scope, and
> their address is not taken."
> 
> Phil
> 
> 
> 
> 
> 
> On Thu, Nov 17, 2016 at 9:50 PM, Nema, Ashutosh <
> Ashutosh.Nema at amd.com > wrote:
> 
> 
> 
> 
> 
> 
> If I understood it correctly, __declspec(noalias) is not the same as
> specifying restrict on each parameter.
> 
> 
> 
> It means in the mentioned example a, b & c don't modify or reference
> any global state, but they are free to alias one another.
> 
> 
> 
> You could specify restrict on each one to indicate that they do not
> alias each other.
> 
> 
> 
> For more details refer:
> https://msdn.microsoft.com/en-us/library/k649tyc7.aspx
> 
> 
> 
> Regards,
> 
> Ashutosh
> 
> 
> 
> 
> 
> 
> From: llvm-dev [mailto: llvm-dev-bounces at lists.llvm.org ] On Behalf
> Of Phil Tomson via llvm-dev
> Sent: Friday, November 18, 2016 12:23 AM
> To: LLVM Developers Mailing List < llvm-dev at lists.llvm.org >
> Subject: [llvm-dev] Loop invariant not being optimized
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I've got an example where I think that there should be some
> loop-invariant optimization happening, but it's not. Here's the C
> code:
> 
> #define DIM 8
> #define UNROLL_DIM DIM
> typedef double InArray[DIM][DIM];
> 
> __declspec(noalias) void f1( InArray c, const InArray a, const
> InArray b )
> {
> 
> #pragma clang loop unroll_count(UNROLL_DIM)
> for( int i=0;i<DIM;i++)
> #pragma clang loop unroll_count(UNROLL_DIM)
> for( int j=0;j<DIM;j++)
> #pragma clang loop unroll_count(UNROLL_DIM)
> for( int k=0;k<DIM;k++) {
> c[i][k] = c[i][k] + a[i][j]*b[j][k];
> }
> }
> 
> The "a[i][j]" there is invariant in that inner loop. I've unrolled
> the loops with the unroll pragma to make the assembly easier to
> read, here's what I see (LVM 3.9, compiling with: clang
> -fms-compatibility -funroll-loops -O3 -c fma.c -o fma.o )
> 
> 
> 0000000000000000 <f1>:
> 0: 29580c0000000000 load r3,r0,0x0,64
> 8: 2958100200000000 load r4,r1,0x0,64 #r4 <- a[0][0]
> 10: 2958140400000000 load r5,r2,0x0,64
> 18: c0580c0805018000 fmaf r3,r4,r5,r3,64
> 20: 79b80c0000000000 store r3,r0,0x0,64
> 28: 2958100000000008 load r4,r0,0x8,64
> 30: 2958140200000000 load r5,r1,0x0,64 #r5 <- a[0][0]
> 38: 2958180400000008 load r6,r2,0x8,64
> 40: c058100a06020000 fmaf r4,r5,r6,r4,64
> 48: 79b8100000000008 store r4,r0,0x8,64
> 50: 2958140000000010 load r5,r0,0x10,64
> 58: 2958180200000000 load r6,r1,0x0,64 #r6 <- a[0][0]
> 60: 29581c0400000010 load r7,r2,0x10,64
> 68: c058140c07028000 fmaf r5,r6,r7,r5,64
> 70: 79b8140000000010 store r5,r0,0x10,64
> 78: 2958180000000018 load r6,r0,0x18,64
> 80: 29581c0200000000 load r7,r1,0x0,64 #r7 <- a[0][0]
> 88: 2958200400000018 load r8,r2,0x18,64
> 90: c058180e08030000 fmaf r6,r7,r8,r6,64
> ...
> 
> (fmaf semantics are: fmaf r1,r2,r3,r4, SIZE r1 <- r2*r3+r4 )
> 
> 
> (load semantics are: load r1,r2,imm, SIZE r1<- mem[r2+imm] )
> 
> 
> 
> All three of the addresses are loaded in every loop. Only two need to
> be reloaded in the inner loop. I added the 'noalias' declspec in the
> C code above thinking that it would indicate that the pointers going
> into the function are not aliased and that that would allow the
> optimization, but it didn't make any difference.
> 
> Of course it's easy to rewrite the example code to avoid this extra
> load/inner loop, but I would have thought this would be a fairly
> straighforward optimization for the optimizer. Am I missing
> something?
> 
> Phil
> 
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory