<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Apr 9, 2020 at 10:47 AM Adrian Prantl <<a href="mailto:aprantl@apple.com">aprantl@apple.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>

<br>

> On Apr 8, 2020, at 2:04 PM, Mircea Trofin via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br>

> <br>

> TL;DR; We can improve compiler optimizations driven by heuristics by replacing those heuristics with machine-learned policies (ML models). Policies are trained offline and ship as part of the compiler. Determinism is maintained because models are fixed when the compiler is operating in production. Fine-tuning or regressions may be handled by incorporating the interesting cases in the ML training set, retraining the compiler, and redeploying it. <br>

> For a first milestone, we chose inlining for size (-Oz) on X86-64. We were able to train an ML model to produce binaries 1.5-6% smaller than -Oz of tip-of-tree. The trained model appears to generalize well over a diverse set of binaries. Compile time is increased marginally (under 5%). The model also happens to produce slightly better - performing code under SPEC2006 - total score improvement by 1.75%. As we only wanted to verify there is no significant regression in SPEC, and given the milestone goals, we haven’t dug any deeper into the speed results.<br>

> <br>

<br>

How generally applicable are the results? I find it easy to believe that if, for example, you train a model on SPEC2006, you will get fantastic results when compiling SPEC2006, but does that translate well to other inputs? </blockquote><div><br></div><div>The current model was trained on an internal corpus of ~25K modules, obtained from an internal "search" binary. Only ~20 or so are shared with llvm, for example. In particular, no SPEC modules were used for training. So the results appear to generalize reasonably well. </div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Or is the idea to train the compiler on the exact input program to be compiled, (perhaps in a CI of sorts), so you end up with a specialized compiler for each input program?<br></blockquote><div><br></div><div>That's the anti-goal :) The hope is to have reference policies, trained from a representative corpus, which are 'good enough' for general use (just like manual heuristics today), while having a systematic methodology for interested parties to go the extra step and fine-tune (retrain) to their specifics. </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

-- adrian</blockquote></div></div>