<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hi,<div class=""><br class=""></div><div class="">While machine assembly might be a way to diagnose problems, another way would be leveraging the Optimization Remark framework following the instructions here: <a href="https://llvm.org/docs/Remarks.html" class="">https://llvm.org/docs/Remarks.html</a></div><div class=""><br class=""></div><div class="">Basically it will print out a bunch of message regarding whether an optimization missed certain expectations. And telling you which part of the code it happened as well.</div><div class=""><br class=""></div><div class="">Best,</div><div class="">Min<br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Aug 23, 2020, at 10:53 AM, Riyaz Puthiyapurayil via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="WordSection1" style="page: WordSection1; caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 16px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 14pt; font-family: Consolas;" class="">I am analyzing a clang 10.0.0 vs gcc 7.3 performance difference that I can reproduce in the following test.<o:p class=""></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 14pt;" class=""><o:p class=""> </o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><b class=""><span style="font-size: 14pt; font-family: Consolas;" class="">unsigned</span></b><span style="font-size: 14pt; font-family: Consolas;" class=""><span class="Apple-converted-space"> </span>foo(<b class="">unsigned</b><span class="Apple-converted-space"> </span>t1,<span class="Apple-converted-space"> </span><b class="">unsigned</b><span class="Apple-converted-space"> </span>t2,<span class="Apple-converted-space"> </span><b class="">int</b><span class="Apple-converted-space"> </span>count,<span class="Apple-converted-space"> </span><b class="">int</b><span class="Apple-converted-space"> </span>step) {<o:p class=""></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 14pt; font-family: Consolas;" class=""> <span class="Apple-converted-space"> </span><b class="">unsigned</b><span class="Apple-converted-space"> </span>tmp = 0;<o:p class=""></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 14pt; font-family: Consolas;" class=""> <span class="Apple-converted-space"> </span><b class="">int</b><span class="Apple-converted-space"> </span>state = 0;<o:p class=""></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 14pt; font-family: Consolas;" class=""> <b class="">for</b><span class="Apple-converted-space"> </span>(<b class="">int</b><span class="Apple-converted-space"> </span>i = 0 ; i < count ; i += step) {<o:p class=""></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 14pt; font-family: Consolas;" class=""> state++;<o:p class=""></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 14pt; font-family: Consolas;" class=""> <span class="Apple-converted-space"> </span><b class="">if</b><span class="Apple-converted-space"> </span>(state > 5)<o:p class=""></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 14pt; font-family: Consolas;" class=""> state = 0;<o:p class=""></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 14pt; font-family: Consolas;" class=""> <span class="Apple-converted-space"> </span><b class="">if</b><span class="Apple-converted-space"> </span>(state == 3)<o:p class=""></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 14pt; font-family: Consolas;" class=""> tmp += t2;<o:p class=""></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 14pt; font-family: Consolas;" class=""> }<o:p class=""></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 14pt; font-family: Consolas;" class=""> <span class="Apple-converted-space"> </span><b class="">return</b> tmp;<o:p class=""></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 14pt; font-family: Consolas;" class="">}<o:p class=""></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 14pt; font-family: Consolas;" class=""><o:p class=""> </o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 14pt; font-family: Consolas;" class="">Clang output is about 40% slower when the function is called with t2=7, count=2000000000, step=3 (t1 is unimportant in this case as it is unused here). The attached screenshot shows the `perf report` annotated assembly code from clang and gcc (clang is on the left). Gcc generated code takes 0.512 sec vs clang’s 0.731 sec. The machine I am running is a Broadwell… Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz.<o:p class=""></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 14pt; font-family: Consolas;" class=""><o:p class=""> </o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 14pt; font-family: Consolas;" class=""><o:p class=""> </o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 14pt; font-family: Consolas;" class="">The code generated by gcc runs consistently faster for all values for `step` I tried; in some cases, the performance difference is worse than 40% seen with the aforementioned parameter values to `foo`. The code generated by clang is a direct result of simplifycfg that eliminates the inner branches and replaces them with `select` which is then lowered to the two `cmov` instructions.<o:p class=""></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 14pt; font-family: Consolas;" class=""><o:p class=""> </o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 14pt; font-family: Consolas;" class="">The code generated by clang takes far fewer branches but executes more instructions. `perf` reports 32.76% front-end cycles idle with the clang code compared to 24.20% for gcc generated code. Clang generated code seems to perform worse in branch-miss and icache events (as reported by `perf`). But it is not clear why. Are the two back-to-back cmove instructions the reason? Any comments on this?<o:p class=""></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 14pt; font-family: Consolas;" class=""><o:p class=""> </o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 14pt; font-family: Consolas;" class=""><o:p class=""> </o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><span id="cid:image002.png@01D67897.72235000"><image002.png></span><span style="font-size: 14pt; font-family: Consolas;" class=""><o:p class=""></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 14pt; font-family: Consolas;" class=""><o:p class=""> </o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 14pt; font-family: Consolas;" class=""><o:p class=""> </o:p></span></div></div><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 16px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">_______________________________________________</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 16px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 16px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">LLVM Developers mailing list</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 16px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><a href="mailto:llvm-dev@lists.llvm.org" style="color: rgb(149, 79, 114); text-decoration: underline; font-family: Helvetica; font-size: 16px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;" class="">llvm-dev@lists.llvm.org</a><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 16px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" style="color: rgb(149, 79, 114); text-decoration: underline; font-family: Helvetica; font-size: 16px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;" class="">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a></div></blockquote></div><br class=""></div></body></html>