[llvm-dev] Loop invariant not being optimized

Fri Nov 18 16:36:01 PST 2016

Oh, I see. Yes, this works:

__declspec(noalias)
void f1(       double c[restrict DIM][DIM],
         const double a[restrict DIM][DIM],
         const double b[restrict DIM][DIM] )
{

#pragma clang loop unroll_count(UNROLL_DIM)
    for( int i=0;i<DIM;i++)

#pragma clang loop unroll_count(UNROLL_DIM)
        for( int j=0;j<DIM;j++)

#pragma clang loop  unroll_count(UNROLL_DIM)
            for( int k=0;k<DIM;k++) {
                c[i][k] = c[i][k] + a[i][j]*b[j][k];
            }
}

...works as in the invariants are optimized.  Thanks.

Phil

On Fri, Nov 18, 2016 at 3:29 PM, Hal Finkel <hfinkel at anl.gov> wrote:

> Hi Phil,
>
> I'm not sure whether we do anything with __declspec(noalias), but if I had
> to guess, when you used restrict, you did not do it correctly. You can see
> http://en.cppreference.com/w/c/language/restrict for some additional
> usage examples.
>
>  -Hal
>
> ----- Original Message -----
> > From: "Phil Tomson via llvm-dev" <llvm-dev at lists.llvm.org>
> > To: "Ashutosh Nema" <Ashutosh.Nema at amd.com>
> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
> > Sent: Friday, November 18, 2016 12:00:58 PM
> > Subject: Re: [llvm-dev] Loop invariant not being optimized
> >
> >
> >
> >
> >
> > I tried changing 'noalias' to 'restrict' in the code and I get:
> >
> > fma.c:17:12: warning: 'restrict' attribute only applies to return
> > values that are pointers
> >
> > It seems like 'noalias' would be the correct attribute here, from the
> > article you linked:
> >
> > "if a function is annotated as noalias , the optimizer can assume
> > that, in addition to the parameters themselves, only first-level
> > indirections of pointer parameters are referenced or modified inside
> > the function. The visible global state is the set of all data that
> > is not defined or referenced outside of the compilation scope, and
> > their address is not taken."
> >
> > Phil
> >
> >
> >
> >
> >
> > On Thu, Nov 17, 2016 at 9:50 PM, Nema, Ashutosh <
> > Ashutosh.Nema at amd.com > wrote:
> >
> >
> >
> >
> >
> >
> > If I understood it correctly, __declspec(noalias) is not the same as
> > specifying restrict on each parameter.
> >
> >
> >
> > It means in the mentioned example a, b & c don't modify or reference
> > any global state, but they are free to alias one another.
> >
> >
> >
> > You could specify restrict on each one to indicate that they do not
> > alias each other.
> >
> >
> >
> > For more details refer:
> > https://msdn.microsoft.com/en-us/library/k649tyc7.aspx
> >
> >
> >
> > Regards,
> >
> > Ashutosh
> >
> >
> >
> >
> >
> >
> > From: llvm-dev [mailto: llvm-dev-bounces at lists.llvm.org ] On Behalf
> > Of Phil Tomson via llvm-dev
> > Sent: Friday, November 18, 2016 12:23 AM
> > To: LLVM Developers Mailing List < llvm-dev at lists.llvm.org >
> > Subject: [llvm-dev] Loop invariant not being optimized
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > I've got an example where I think that there should be some
> > loop-invariant optimization happening, but it's not. Here's the C
> > code:
> >
> > #define DIM 8
> > #define UNROLL_DIM DIM
> > typedef double InArray[DIM][DIM];
> >
> > __declspec(noalias) void f1( InArray c, const InArray a, const
> > InArray b )
> > {
> >
> > #pragma clang loop unroll_count(UNROLL_DIM)
> > for( int i=0;i<DIM;i++)
> > #pragma clang loop unroll_count(UNROLL_DIM)
> > for( int j=0;j<DIM;j++)
> > #pragma clang loop unroll_count(UNROLL_DIM)
> > for( int k=0;k<DIM;k++) {
> > c[i][k] = c[i][k] + a[i][j]*b[j][k];
> > }
> > }
> >
> > The "a[i][j]" there is invariant in that inner loop. I've unrolled
> > the loops with the unroll pragma to make the assembly easier to
> > read, here's what I see (LVM 3.9, compiling with: clang
> > -fms-compatibility -funroll-loops -O3 -c fma.c -o fma.o )
> >
> >
> > 0000000000000000 <f1>:
> > 0: 29580c0000000000 load r3,r0,0x0,64
> > 8: 2958100200000000 load r4,r1,0x0,64 #r4 <- a[0][0]
> > 10: 2958140400000000 load r5,r2,0x0,64
> > 18: c0580c0805018000 fmaf r3,r4,r5,r3,64
> > 20: 79b80c0000000000 store r3,r0,0x0,64
> > 28: 2958100000000008 load r4,r0,0x8,64
> > 30: 2958140200000000 load r5,r1,0x0,64 #r5 <- a[0][0]
> > 38: 2958180400000008 load r6,r2,0x8,64
> > 40: c058100a06020000 fmaf r4,r5,r6,r4,64
> > 48: 79b8100000000008 store r4,r0,0x8,64
> > 50: 2958140000000010 load r5,r0,0x10,64
> > 58: 2958180200000000 load r6,r1,0x0,64 #r6 <- a[0][0]
> > 60: 29581c0400000010 load r7,r2,0x10,64
> > 68: c058140c07028000 fmaf r5,r6,r7,r5,64
> > 70: 79b8140000000010 store r5,r0,0x10,64
> > 78: 2958180000000018 load r6,r0,0x18,64
> > 80: 29581c0200000000 load r7,r1,0x0,64 #r7 <- a[0][0]
> > 88: 2958200400000018 load r8,r2,0x18,64
> > 90: c058180e08030000 fmaf r6,r7,r8,r6,64
> > ...
> >
> > (fmaf semantics are: fmaf r1,r2,r3,r4, SIZE r1 <- r2*r3+r4 )
> >
> >
> > (load semantics are: load r1,r2,imm, SIZE r1<- mem[r2+imm] )
> >
> >
> >
> > All three of the addresses are loaded in every loop. Only two need to
> > be reloaded in the inner loop. I added the 'noalias' declspec in the
> > C code above thinking that it would indicate that the pointers going
> > into the function are not aliased and that that would allow the
> > optimization, but it didn't make any difference.
> >
> > Of course it's easy to rewrite the example code to avoid this extra
> > load/inner loop, but I would have thought this would be a fairly
> > straighforward optimization for the optimizer. Am I missing
> > something?
> >
> > Phil
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161118/cb56f948/attachment.html>