[llvm-dev] Loop invariant not being optimized
Phil Tomson via llvm-dev
llvm-dev at lists.llvm.org
Thu Nov 17 10:52:38 PST 2016
I've got an example where I think that there should be some loop-invariant
optimization happening, but it's not. Here's the C code:
#define DIM 8
#define UNROLL_DIM DIM
typedef double InArray[DIM][DIM];
__declspec(noalias) void f1( InArray c, const InArray a, const InArray b )
{
#pragma clang loop unroll_count(UNROLL_DIM)
for( int i=0;i<DIM;i++)
#pragma clang loop unroll_count(UNROLL_DIM)
for( int j=0;j<DIM;j++)
#pragma clang loop unroll_count(UNROLL_DIM)
for( int k=0;k<DIM;k++) {
c[i][k] = c[i][k] + a[i][j]*b[j][k];
}
}
The "a[i][j]" there is invariant in that inner loop. I've unrolled the
loops with the unroll pragma to make the assembly easier to read, here's
what I see (LVM 3.9, compiling with: clang -fms-compatibility
-funroll-loops -O3 -c fma.c -o fma.o )
0000000000000000 <f1>:
0: 29580c0000000000 load r3,r0,0x0,64
8: 2958100200000000 load r4,r1,0x0,64 #r4 <- a[0][0]
10: 2958140400000000 load r5,r2,0x0,64
18: c0580c0805018000 fmaf r3,r4,r5,r3,64
20: 79b80c0000000000 store r3,r0,0x0,64
28: 2958100000000008 load r4,r0,0x8,64
30: 2958140200000000 load r5,r1,0x0,64 #r5 <- a[0][0]
38: 2958180400000008 load r6,r2,0x8,64
40: c058100a06020000 fmaf r4,r5,r6,r4,64
48: 79b8100000000008 store r4,r0,0x8,64
50: 2958140000000010 load r5,r0,0x10,64
58: 2958180200000000 load r6,r1,0x0,64 #r6 <- a[0][0]
60: 29581c0400000010 load r7,r2,0x10,64
68: c058140c07028000 fmaf r5,r6,r7,r5,64
70: 79b8140000000010 store r5,r0,0x10,64
78: 2958180000000018 load r6,r0,0x18,64
80: 29581c0200000000 load r7,r1,0x0,64 #r7 <- a[0][0]
88: 2958200400000018 load r8,r2,0x18,64
90: c058180e08030000 fmaf r6,r7,r8,r6,64
...
(fmaf semantics are: fmaf r1,r2,r3,r4, SIZE r1 <- r2*r3+r4 )
(load semantics are: load r1,r2,imm, SIZE r1<- mem[r2+imm] )
All three of the addresses are loaded in every loop. Only two need to be
reloaded in the inner loop. I added the 'noalias' declspec in the C code
above thinking that it would indicate that the pointers going into the
function are not aliased and that that would allow the optimization, but it
didn't make any difference.
Of course it's easy to rewrite the example code to avoid this extra
load/inner loop, but I would have thought this would be a fairly
straighforward optimization for the optimizer. Am I missing something?
Phil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161117/f570645d/attachment.html>
More information about the llvm-dev
mailing list