<div dir="ltr"><div><div>Hi Sam, Thanks for your helping.<br><br></div><div>I've never noticed OPT before, and I tried to run it on the bitcode, but still I get the code listed above.<br></div><div>FYI, I did as the following:<br>$ clang -c -m32 -O3 -emit-llvm ex11.c -o ex11.bc <br></div><div>$ opt -S -gvn ex11.bc > ex11.ll <br></div><div>$ llc -march=bfin ex11.ll<br></div><div>Is there any thing I'm missing?<br><br><br></div><div>And the following is how I did before:<br>$ clang -S -m32 -emit-llvm -O3 file.c -o file.ll<br>$ llc -march=bfin file.ll<br></div><br></div><div>Original C Source File:<br><br> 1 typedef struct state {<br> 2 int V[8][8];<br> 3 int *offset[8];<br> 4 } state_t;<br> 5 <br> 6 void foo(state_t* state, int ch, int *buffer)<br> 7 {<br> 8 int *offset = state->offset[ch];<br> 9 <br> 10 int idx, i;<br> 11 for (i = 0, idx = 0; i < 100; i++, idx += 5) {<br> 12 //long long tmp = 0;<br> 13 int tmp = 0;<br> 14 for (int j = 0; j < 2; j++) {<br> 15 tmp += state->V[ch][offset[i]+2*j+0]*buffer[idx + j];<br> 16 tmp += state->V[ch][offset[i]+2*j+1]*buffer[idx + j];<br> 17 }<br> 18 <br> 19 // disable optimization<br> 20 //volatile long long ret = tmp;<br> 21 volatile int ret = tmp;<br> 22 }<br> 23 }<br><br></div><div>.ll file after run OPT on .bc file<br>; ModuleID = 'ex11.bc'<br>target datalayout = "e-m:o-p:32:32-f64:32:64-f80:128-n8:16:32-S128"<br>target triple = "i386-apple-macosx10.10.0"<br><br>%struct.state = type { [8 x [8 x i32]], [8 x i32*] }<br><br>; Function Attrs: nounwind ssp<br>define void @foo(%struct.state* nocapture readonly %state, i32 %ch, i32* nocapture readonly %buffer) #0 {<br>entry:<br> %ret = alloca i32, align 4<br> %arrayidx = getelementptr inbounds %struct.state* %state, i32 0, i32 1, i32 %ch<br> %0 = load i32** %arrayidx, align 4, !tbaa !2<br> br label %for.cond3.preheader<br><br>for.cond3.preheader: ; preds = %for.cond3.preheader, %entry<br> %i.052 = phi i32 [ 0, %entry ], [ %inc27, %for.cond3.preheader ]<br> %idx.051 = phi i32 [ 0, %entry ], [ %add28, %for.cond3.preheader ]<br> %arrayidx6 = getelementptr inbounds i32* %0, i32 %i.052<br> %1 = load i32* %arrayidx6, align 4, !tbaa !6<br> %arrayidx9 = getelementptr inbounds %struct.state* %state, i32 0, i32 0, i32 %ch, i32 %1<br> %2 = load i32* %arrayidx9, align 4, !tbaa !6<br> %arrayidx11 = getelementptr inbounds i32* %buffer, i32 %idx.051<br> %3 = load i32* %arrayidx11, align 4, !tbaa !6<br> %add17 = add nsw i32 %1, 1<br> %arrayidx20 = getelementptr inbounds %struct.state* %state, i32 0, i32 0, i32 %ch, i32 %add17<br> %4 = load i32* %arrayidx20, align 4, !tbaa !6<br> %tmp = add i32 %4, %2<br> %tmp48 = mul i32 %tmp, %3<br> %add.1 = add nsw i32 %1, 2<br> %arrayidx9.1 = getelementptr inbounds %struct.state* %state, i32 0, i32 0, i32 %ch, i32 %add.1<br> %5 = load i32* %arrayidx9.1, align 4, !tbaa !6<br> %add10.1 = add nuw nsw i32 %idx.051, 1<br> %arrayidx11.1 = getelementptr inbounds i32* %buffer, i32 %add10.1<br> %6 = load i32* %arrayidx11.1, align 4, !tbaa !6<br> %add17.1 = add nsw i32 %1, 3<br> %arrayidx20.1 = getelementptr inbounds %struct.state* %state, i32 0, i32 0, i32 %ch, i32 %add17.1<br> %7 = load i32* %arrayidx20.1, align 4, !tbaa !6<br> %tmp.1 = add i32 %7, %5<br> %tmp48.1 = mul i32 %tmp.1, %6<br> %add24.1 = add i32 %tmp48.1, %tmp48<br> store volatile i32 %add24.1, i32* %ret, align 4<br> %inc27 = add nuw nsw i32 %i.052, 1<br> %add28 = add nuw nsw i32 %idx.051, 5<br> %exitcond53 = icmp eq i32 %inc27, 100<br> br i1 %exitcond53, label %for.end29, label %for.cond3.preheader<br><br>for.end29: ; preds = %for.cond3.preheader<br> ret void<br>}<br><br>attributes #0 = { nounwind ssp "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }<br><br>!llvm.module.flags = !{!0}<br>!llvm.ident = !{!1}<br><br>!0 = !{i32 1, !"PIC Level", i32 2}<br>!1 = !{!"clang version 3.6.0 (tags/RELEASE_360/final)"}<br>!2 = !{!3, !3, i64 0}<br>!3 = !{!"any pointer", !4, i64 0}<br>!4 = !{!"omnipotent char", !5, i64 0}<br>!5 = !{!"Simple C/C++ TBAA"}<br>!6 = !{!7, !7, i64 0}<br>!7 = !{!"int", !4, i64 0}<br><br></div><div>And the generated .s file<br><br> .text<br> .macosx_version_min 10, 10<br> .file "ex11.ll"<br> .globl foo<br> .align 4<br> .type foo,@function<br>foo: // @foo<br>// BB#0: // %entry<br> link 16;<br> [fp - 4] = r4;<br> [fp - 8] = r5;<br> [fp - 12] = r6;<br> r3 = r1 << 2;<br> r4 = r0 + r3;<br> r3 = 0 (x);<br> r2 += 4;<br> p0 = r4;<br> r4 = [p0 + 256];<br> p0 = r2;<br>LBB0_1: // %for.cond3.preheader<br> // =>This Inner Loop Header: Depth=1<br> r2 = r1 << 5;<br> r2 = r0 + r2;<br> r5 = r4 + r3;<br> p1 = r5;<br> r5 = [p1];<br> r5 = r5 << 2;<br> r2 = r2 + r5;<br> p1 = r2; <--------------<br> r5 = [p1];<br> p1 = r2; <--------------- redundant copy<br> r6 = [p1 + 4];<br> r5 = r6 + r5;<br> r6 = [p0 + -4];<br> r5 *= r6;<br> p1 = r2; <--------------- redundant copy<br> r6 = [p1 + 8];<br> p1 = r2; <--------------- redundant copy<br> r2 = [p1 + 12];<br> r2 = r2 + r6;<br> r6 = [p0];<br> r2 *= r6;<br> r2 = r2 + r5;<br> [fp - 16] = r2;<br> r2 = p0;<br> r2 += 20;<br> r3 += 4;<br> r5 = 400 (z);<br> cc = r3 == r5;<br> p0 = r2;<br> if !cc jump LBB0_1;<br> jump LBB0_2;<br>LBB0_2: // %for.end29<br> r6 = [fp - 12];<br> r5 = [fp - 8];<br> r4 = [fp - 4];<br> unlink;<br> rts;<br>Ltmp0:<br> .size foo, Ltmp0-foo<br><br><br>Huang<br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, May 22, 2015 at 12:24 AM, Samuel Crow <span dir="ltr"><<a href="mailto:samueldcrow@gmail.com" target="_blank">samueldcrow@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5"><br>
On May 21, 2015, at 7:21 AM, zan jyu Wong wrote:<br>
<br>
> Hi,<br>
><br>
> I've been working on a Blackfin backend (llvm-3.6.0) based on the previous one that was removed in llvm-3.1.<br>
> llc generates codes like this:<br>
><br>
> 29 p1 = r2;<br>
> 30 r5 = [p1];<br>
> 31 p1 = r2;<br>
> 32 r6 = [p1 + 4];<br>
> 33 r5 = r6 + r5;<br>
> 34 r6 = [p0 + -4];<br>
> 35 r5 *= r6;<br>
> 36 p1 = r2;<br>
> 37 r6 = [p1 + 8];<br>
> 38 p1 = r2;<br>
><br>
> p1 and r2 are in different register classes.<br>
> A p* register can be used for load/stroe values from memory while a r* register can not.<br>
><br>
> As we can see, line 31, 36, 38 can be deleted. How can I configure llc to do this? Or do I have to write a custom pass to do this optimization? Any suggestion is welcome.<br>
><br>
> Thanks,<br>
><br>
> Huang<br>
<br>
</div></div>Hello Huang,<br>
<br>
SIlly as this may sound, did you run OPT on the bitcode first before using LLC?<br>
<br>
Cheers,<br>
<br>
Sam</blockquote></div><br></div>