[LLVMdev] MI scheduler produce badly code with inline function

Andrew Trick atrick at apple.com
Mon Oct 14 21:38:32 PDT 2013


On Oct 14, 2013, at 3:27 AM, Zakk <zakk0610 at gmail.com> wrote:

> Hi all, 
> I meet this problem when compiling the TREAM benchmark (http://www.cs.virginia.edu/stream/FTP/Code/) with enable-misched
> 
> The small function will be scheduled as good code, but if opt inline this function, the inline part will be scheduled as bad code.

A bug for this is welcome. Pretty soon, I’ll be verifying A9 performance and changing the default scheduler. When I do this, I’ll be using the new machine model:

(-mllvm) -sched-itins=false

However, some scheduler changes are required for that mode to fully enforce pipeline hazards.

> so I rewrite a simple code as attached link (foo.c), and compiled with two different methods:
> 
> method A:
> $clang -O3 foo.c -static -S -o foo.s -mllvm -enable-misched  -mllvm -unroll-count=4 --target=arm -mfloat-abi=hard -mcpu=cortex-a9 -fno-vectorize -fno-slp-vectorize
> 
> and
> 
> method B:
> $clang foo.c -S -emit-llvm -o foo.bc --target=arm -mfloat-abi=hard -mcpu=cortex-a9
> $opt foo.bc -O3 -unroll-count=4 -o foo.opt.bc
> $llc foo.opt.bc -o foo.opt.s -march=arm -mcpu=cortex-a9 -enable-misched

You can try “clang -O3 -mllvm -disable-llvm-optzns …”. clang should generate the same bitcode, but skip the “opt” step.

If that doesn’t work it can be a nightmare trying to decompose the compilations steps with fidelity. You can try:
- clang -### … 
- clang -mllvm -print-options …
- Passing a full triple to all tools with -mtriple
- Debug the TargetOptions fields
- -print-after-all to see which phase is different

Even if you get all the options right, the process of serializing and rereading the IR can affect the optimizations.

Sorry. I’ve been trying to think of a way to improve this situation.

-Andy

> (ps. I had checked with debug-pass=structure, so I think they are equivalently)
> 
> but the result is different: 
> You can find the LBB1_4 of foo.s, it always reuses the same reg for computation, but LBB1_4 of foo.opt.s doesn't.
> 
> My question is how to just use clang (method A) to achieve B result? 
> Or i am missing something here?
> 
> I really appreciate any help and suggestions.
> Thanks
> 
> Kuan-Hsu
> 
> ------- file link -------
> foo.c: http://goo.gl/nVa2K0
> foo.s: http://goo.gl/ML9eNj
> foo.opt.s: http://goo.gl/31PCnf
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131014/b0115881/attachment.html>


More information about the llvm-dev mailing list