<div dir="ltr">Hi Matt,<div><br></div><div>Thanks you so much for the reply!</div><div><br></div><div>I've tried the llvm-mca, it is helpful.</div><div>I was wondering whether the llvm-mca support the assembly code for the ARM?</div><div><br></div><div>I cross-compile the test file for ARM like that: clang test.c -O2 -target arm-linux-gnueabihf -static -S -o test.s</div><div><br></div><div>If I want to check the performance using llvm-mca, is there any option of "-mcpu" for ARM ?<br></div><div><br></div><div><br></div><div>Thanks,<br></div><div>Yin</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">2018-05-07 21:52 GMT-04:00 <span dir="ltr"><<a href="mailto:Matthew.Davis@sony.com" target="_blank">Matthew.Davis@sony.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Yin,<br>
<br>
From: llvm-dev <<a href="mailto:llvm-dev-bounces@lists.llvm.org">llvm-dev-bounces@lists.llvm.<wbr>org</a>> On Behalf Of Yin Liu via llvm-dev<br>
<span class="">> Hello,<br>
> <br>
> As is known to all, there is a relationship between program's instructions and its execution time. In other words, we can estimate the execution time based on the number of program > instructions.<br>
> <br>
> I'm curious about what the relationship between IR instruction and execution time. I know the number of program instructions and the execution time is highly related to the<br>
> platform and architecture, while the IR instruction is independent and intermediate. But, intuitively, there may be some relationship between IR instruction and execution time.<br>
> <br>
> Would it be possible to give me some advice about it?<br>
<br>
</span>What instructions finally get emitted by the compiler is highly dependent on the specified target. As you pointed out, IR is relatively abstract, and can at best only generate a "rough" estimate to timing. Maybe that loss of fidelity is acceptable in your case. Be aware that there are also target specific optimizations that operate after the IR is lowered to a target friendly representation. Any early approximation of IR performance will be less accurate after target specific optimization passes are ran. For more accurate results, you will need to wait until the IR is lowered to the target architecture and emitted as assembly or object code. But it seems that might be too late for what you are looking for. In any case, if you do want to analyze the assembly code, then look no further than llvm's Machine Code Analyzer(MCA). This tool takes an assembly code as input and generates throughput and latency information. For more details see: <a href="https://llvm.org/docs/CommandGuide/llvm-mca.html" rel="noreferrer" target="_blank">https://llvm.org/docs/<wbr>CommandGuide/llvm-mca.html</a><br>
<br>
-Matt<br>
</blockquote></div><br></div>