<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hi Kristof,<div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Apr 6, 2017, at 6:53 AM, Kristof Beyls <<a href="mailto:kristof.beyls@arm.com" class="">kristof.beyls@arm.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii" class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
I've been digging a little bit deeper into the biggest performance regressions I've observed.
<div class=""><br class="">
</div>
<div class="">What I've observed so far is:</div>
<div class="">* A lot of the biggest regressions are caused by unnecessarily moving floating point values through general purpose registers. I've raised <a href="http://bugs.llvm.org/show_bug.cgi?id=32550" class="">http://bugs.llvm.org/show_bug.cgi?id=32550</a> for
this. I think this one definitely needs fixing before enabling GlobalISel by default at -O0.</div></div></div></blockquote><div><br class=""></div><div>I commented in the PR. This is a known problem and we have a solution. Given this is an optimization in the sense that it does not affect the correctness of the program, we didn’t push for fixing it now.</div><div><br class=""></div><div>For O0 we wanted to focus ourselves on generating correct code. Unless the regressions you are seeing are preventing debugging/running of the program, I wouldn’t block the flip of the switch on that.</div><div><br class=""></div><div>What do you think? </div><br class=""><blockquote type="cite" class=""><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
<div class="">* FastISel seems to transform division-by-constant-power-of-2 into right shift (see <a href="https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/SelectionDAG/FastISel.cpp#L456-L468" class="">https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/SelectionDAG/FastISel.cpp#L456-L468</a>).
GlobalISel does not. It seems to me that at -O0 there may be reasons not perform this transformation, but maybe there is a good reason why FastISel does this?</div></div></div></blockquote><div><br class=""></div><div>I think FastISel tries to generate the best code it can no matter what. For GISel O0 however, not doing this optimization sounds sensible to me.</div><div>Now, I would say that the same remark as the previous bullet point apply: we shouldn’t do it unless it gets in the way of running/debugging the program.</div><br class=""><blockquote type="cite" class=""><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
<div class="">* FastISel doesn’t\ seem to handle functions with switch statements, so it falls back to DAGISel. DAGISel produces code that's a lot better than GlobalISel for switch statement at -O0. I'm not sure if we need to do something here before enabling
GlobalISel by default. I'm thinking we may need to add a smarter way to lower switch statements rather than just a cascaded sequence of conditional branches.</div></div></div></blockquote><div><br class=""></div><div>Sounds optimization-ish to me. Same remark.</div><br class=""><blockquote type="cite" class=""><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
<div class=""><br class="">
</div>
<div class="">I'll try to add the above content to the document Diana created at <font color="#1155cc" face="arial, sans-serif" class=""><span style="font-size: 12.8px; orphans: 2; widows: 2; background-color: rgb(255, 255, 255);" class=""><a href="https://goo.gl/IS2Bdw" class="">https://goo.gl/IS2Bdw</a></span></font> too.</div>
<div class=""><br class="">
</div>
<div class="">Thanks,</div>
<div class=""><br class="">
</div>
<div class="">Kristof</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On 3 Apr 2017, at 17:10, Kristof Beyls <<a href="mailto:Kristof.Beyls@arm.com" class="">Kristof.Beyls@arm.com</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
I've kicked off a run to compare "-O0 -g" versus "-O0 -g -mllvm -global-isel -mllvm -global-isel-abort=2".
<div class="">I've selected the test-suite (albeit a version which is a couple of months old now) and a few short-running proprietary benchmarks to get data back quickly for an initial feel of where things are.</div>
<div class="">This was running on Cortex-A57 AArch64 Linux.</div>
<div class=""><br class="">
</div>
<div class="">I saw one assertion failure in GlobalISel, see <a href="http://bugs.llvm.org/show_bug.cgi?id=32471" class="">http://bugs.llvm.org/show_bug.cgi?id=32471</a>. This is in a program compiled at -O2 (my out-dated test-suite still overrides -O0 and
instead uses -O for that program). The root cause of the failure seems to be due to LowLevelType not supporting vectors of pointers. I think this demonstrates that for correctness, we should be trying to test more than -O0, or even more than just LLVM-IR produced
by clang, as other front-ends could run into this even at -O0.</div>
<div class=""><br class="">
</div>
<div class="">Due to this assertion failure and the infrastructure I used, the numbers below do not include test-suite/MultiSource/Benchmarks results.</div>
<div class=""><br class="">
</div>
<div class="">On the non-correctness aspects, LNT tells me that:</div>
<div class="">- The programs that report execution time, on geomean are about 17% slower.</div>
<div class="">- The programs that report scores, on geomean are about 21% slower.</div>
<div class="">- Code size is up on geomean about 11%.</div>
<div class="">I'm afraid I don't have compile time numbers, nor any feel for debug info quality.</div>
<div class=""><br class="">
</div>
<div class="">I'll need quite a bit more time to dig into the details to come up with something actionable, although the fact that LowLevelType doesn't support vectors of pointers is already actionable.</div>
<div class="">Nevertheless, I thought to share what I see as is, to see if others see similar results so far.</div>
<div class=""><br class="">
</div>
<div class="">I thought Diana was going to look into fallback rate on the test-suite on AArch64 linux?</div>
<div class=""><br class="">
</div>
<div class="">Thanks,</div>
<div class=""><br class="">
</div>
<div class="">Kristof</div>
<div class=""><br class="">
</div>
<div class="">
<blockquote type="cite" class="">
<div class="">On 30 Mar 2017, at 10:54, Renato Golin <<a href="mailto:renato.golin@linaro.org" class="">renato.golin@linaro.org</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">On 30 March 2017 at 00:27, Quentin Colombet <<a href="mailto:qcolombet@apple.com" class="">qcolombet@apple.com</a>> wrote:<br class="">
<blockquote type="cite" class="">On iOS we are at 100% pass rate in 00 g for the LLVM test suite, standard<br class="">
benchmarks and unit tests. In about 5% of all functions GlobalIsel falls<br class="">
back to SDIsel.<br class="">
(Kristof Beyls would have the linux numbers.)<br class="">
The self host compiler correctly builds and runs the LLVM test suite in O0.<br class="">
</blockquote>
<br class="">
Having done no tests at all on my side, I think we need to have<br class="">
similar numbers on Linux to be able to flip across the board.<br class="">
<br class="">
I don't want to flip it only for Darwin and not Linux, as that will<br class="">
fragment the effort too much.<br class="">
<br class="">
I'll check with Diana and Kristof to know what's the best way forward,<br class="">
but it should be reasonably quick.<br class="">
<br class="">
cheers,<br class="">
--renato<br class="">
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
</div></blockquote></div><br class=""></div></body></html>