<div dir="ltr"><span id="m_3635758059184399817m_-2486097331090736440m_-7921475858397876043m_7265285882339294355gmail-docs-internal-guid-507d949f-7fff-0140-47bc-3a9dae61dc70"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:3pt"><span style="background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-size:11pt;font-family:Arial;color:rgb(0,0,0);font-weight:700;vertical-align:baseline;white-space:pre-wrap">TL;DR;</span><span style="background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-size:11pt;font-family:Arial;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap"> We can improve compiler optimizations driven by heuristics by replacing those heuristics with machine-learned policies (ML models). Policies are trained offline and ship as part of the compiler. Determinism is maintained because models are fixed when the compiler is operating in production. Fine-tuning or regressions may be handled by incorporating the interesting cases in the ML training set, retraining the compiler, and redeploying it. </span><br></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">For a first milestone, we chose inlining for size (-Oz) on X86-64. We were able to train an ML model to produce binaries 1.5-6% smaller than -Oz of tip-of-tree. The trained model appears to generalize well over a diverse set of binaries. Compile time is increased marginally (under 5%). The model also happens to produce slightly better - performing code under SPEC2006 - total score improvement by 1.75%. As we only wanted to verify there is no significant regression in SPEC, and given the milestone goals, we haven’t dug any deeper into the speed results.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">We see these results as promising, and as a reasonable point for contributing our current work as a build-time opt-in to LLVM to benefit the community, in the hope of fostering collaboration and learning from the community’s feedback, as we try to better understand the trade-offs such an approach entails, and as we work on expanding the depth and breadth of applying these techniques to compiler optimization problems.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"><a href="https://reviews.llvm.org/D77752">https://reviews.llvm.org/D77752</a><br></span></p><h1 dir="ltr" style="line-height:1.38;margin-top:20pt;margin-bottom:6pt"><span style="font-size:14pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Motivation</span></h1><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Optimization problems, such as inlining or register allocation, are hard: we don’t know of algorithms that are guaranteed to produce the optimum solution in all cases in a timely fashion. Instead, we use heuristics: algorithms that we expect to produce some approximation of the optimum, with some expected degree of generality, in a practical amount of time. </span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Heuristics have some common characteristics. Taking inlining as a case study, it traverses the problem space in some way (bottom-up traversal of the SCC graph), extracts some properties (let’s call them “features”) of the program being optimized, and combine them with some weights (“tune”), to produce a cost (InlineCost), which allows for trade-off analysis. We validate the effectiveness of a heuristic over some accepted set of benchmarks. Over time, we react to regressions or pathological cases observed in the field, by manually analyzing such cases, figuring out an enhancement to the heuristic, and then re-validating over that set of benchmarks (maybe augmented by adding the newly found cases).</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Because heuristics are code that needs to be maintained, there is pressure to reduce complexity: adding more features means we need to reason about the interactions between the old and new features, which scales combinatorially. Re-tuning because of the new features adds a similar kind of complexity. Potentially, we miss out on optimization improvements as a result.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Because tuning is manual, there is pressure to keep the number of benchmarks that can be studied in depth to a humanly-manageable size, which potentially affects the generality of a heuristic or heuristic tuning.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">The main advantage of manual heuristics is arguably their relatively lower overhead: no additional dependencies and more transparent to human analysis and improvement.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Machine learning, in particular reinforcement learning, can address the tensions found in manual heuristics: once features are extracted from the program, the way they are combined and tuned can easily be scaled up through automation, improving effectiveness and generality. A major drawback, at least at this point in time, of machine learning, is that we don’t yet have a fully developed systematic approach for improving policy effectiveness.</span></p><h1 dir="ltr" style="line-height:1.38;margin-top:20pt;margin-bottom:6pt"><span style="font-size:14pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">High level design</span></h1><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">We identify two scenarios for a compiler using ML policies: development and release.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">The release scenario is equivalent to the regular compilation we have today - the only difference is that it uses a pre-trained model (trained in the development scenario beforehand) to make decisions instead of the heuristics. Determinism is guaranteed since the model in the release scenario is fixed. We imagine teams wishing to fine tune the effectiveness of the optimization to their scenarios would train a different model.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">The decision previously evaluated using a human-crafted heuristic is optionally replaced by:</span></p><ul style="margin-top:0px;margin-bottom:0px"><li dir="ltr" style="list-style-type:disc;font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">a compiler-specific component, extracting features from IR (i.e. a vector of values)</span></p></li><li dir="ltr" style="list-style-type:disc;font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">an evaluation of an ML model using those features, to obtain a result. In ML nomenclature, this is referred to using the model for inference (as opposed to training it)</span></p></li></ul><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">For example, when we replaced the decision of whether to inline a callsite, the ML model produces a boolean (inline/don’t inline) based on a features vector characterizing the call site and some broader module-wide context.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Training/development is more complicated, and happens offline - akin to how, today, attempts to improve an optimizing pass also happen offline. A description of the high level design and the specifics we used for the current scope are given in Appendix.</span></p><h1 dir="ltr" style="line-height:1.38;margin-top:20pt;margin-bottom:6pt"><span style="font-size:14pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Current Scope</span></h1><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">The goal of our first milestone was to evaluate end to end an integration of ML with LLVM, and get a first promising result. To that end, we chose inlining for size (-Oz) as a stepping stone, as we perceived it to be more likely to require a simpler evaluation setup than performance-oriented optimizations might. At this point, we only train whether a call site may be inlined or not, leaving the SCC traversal order as-is.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">We are proposing an initial change demonstrating the inference mechanism using a pre-trained model, as a build-time opt-in to llvm. The compiler components needed to perform training are also included in this first change. Subsequent changes would include more training-related components.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">At a high level, the changes we are proposing consist of:</span></p><ol style="margin-top:0px;margin-bottom:0px"><li dir="ltr" style="list-style-type:decimal;font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">a new module analysis pass, InliningAdvisor. By default, its implementation does nothing.</span></p></li><li dir="ltr" style="list-style-type:decimal;font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">minimal hooks into Inliner.cpp.</span></p></li><li dir="ltr" style="list-style-type:decimal;font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">the implementation of InliningAdvisor, activated when we opt-in ML. This is available in Analysis/ML, together with:</span></p></li><ol style="margin-top:0px;margin-bottom:0px"><li dir="ltr" style="list-style-type:lower-alpha;font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Rel/Dev specific ML model handing, also under Analysis/ML</span></p></li><li dir="ltr" style="list-style-type:lower-alpha;font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">a pre-trained model for inlining for size (Analysis/ML/models/inlining)</span></p></li><li dir="ltr" style="list-style-type:lower-alpha;font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">a pre-trained model for predicting native size from IR (Analysis/ML/models/ir_2_native_x86_64), used in Dev mode only.</span></p></li></ol><li dir="ltr" style="list-style-type:decimal;font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Some refactorings in PassBuilder, to allow opting into running mandatory inlining first - some compilation speedup for the ML case, minimal, noise-like size effect. Also simplifies testing (these would be introduced as a preliminary patch).</span></p></li></ol><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">We discuss ‘rel’ mode here, and ‘dev’ mode in the Appendix, as it is more involved.</span></p><h2 dir="ltr" style="line-height:1.38;margin-top:18pt;margin-bottom:6pt"><span style="font-size:12pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Inference Opt-In Mechanism</span></h2><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">The feature is primarily controlled by the cmake flag </span><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">LLVM_USE_ML_POLICY={“Rel”|”Dev”}.</span><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"> Each has different dependencies. The “Rel”ease case requires specifying the location of the pip tensorflow package (currently, that’s </span><span style="font-size:11pt;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"><font face="monospace">tf_nightly</font></span><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">, and it should soon be available in </span><span style="font-size:11pt;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"><font face="monospace">tensorflow</font></span><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">)</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">To opt in the ‘Rel’ case:</span></p><ol style="margin-top:0px;margin-bottom:0px"><li dir="ltr" style="list-style-type:decimal;font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">install tensorflow pip package</span></p></li></ol><p dir="ltr" style="line-height:1.38;margin-left:36pt;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">pip3 install tf_nightly --user</span></p><ol style="margin-top:0px;margin-bottom:0px" start="2"><li dir="ltr" style="list-style-type:decimal;font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">configure llvm build</span></p></li></ol><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">     cmake ../llvm -DLLVM_USE_ML_POLICY=Rel \</span></p><p dir="ltr" style="line-height:1.2;text-indent:36pt;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">  -DLLVM_TF_AOT_RUNTIME=~/.local/lib/python3.7/site-packages/tensorflow \</span></p><p dir="ltr" style="line-height:1.2;text-indent:36pt;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"> {-DLLVM_TF_AOT_COMPILER=<</span><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:italic;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">path to saved_model_cli tool, if needed - it’s usually in the path>}</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:3pt"><b style="font-weight:normal"><br></b></p><ol style="margin-top:0px;margin-bottom:0px" start="3"><li dir="ltr" style="list-style-type:decimal;font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">build llvm as usual.</span></p></li><li dir="ltr" style="list-style-type:decimal;font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">pass </span><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">-mllvm -enable-ml-inliner -mllvm -mandatory-inlinings-first </span><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">to clang.</span></p></li></ol><h3 dir="ltr" style="line-height:1.38;margin-top:16pt;margin-bottom:4pt"><span style="font-size:12pt;font-family:Arial;color:rgb(67,67,67);background-color:transparent;font-weight:400;font-style:italic;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Details</span></h3><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">The ML model is captured as a TensorFlow ‘saved model’. When building llvm, we use TensorFlow’s  XLA native compiler (</span><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">saved_model_cli</span><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">) to compile the saved model into a native static library and a header file. Insofar as LLVM is concerned, there are minimal additional runtime requirements, packaged as source within the pip package: C++ wrappers around the compiled model. These will also be statically linked in the LLVM target. The compiled code is otherwise just a complex arithmetical computation, with no special requirements - it is single threaded and runs natively on the targeted architecture. Together with the aforementioned runtime dependencies, it adds ~115KB to the clang binary (0.08% increase)</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Runtime-wise, we observed a ~10% increase in the time spent in the inliner, for a large (33MB) binary IR module; inlining typically consumes ~10-15% of total compilation time, so the overall compile time overhead of the approach is arguably negligible. This cost is almost in entirety attributable to feature extraction.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Memory-wise, the precompiled model has a fixed size buffer for its inputs, and performs a fixed amount of computations, so the memory overhead inherent to our approach is independent from the program being optimized. Using a small example to avoid effects such as memory use differences due to different inlinings, we observed an 300KB increase in the maximum resident size.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">A table showing effect on -Oz compiled binaries’ size is given in Appendix.</span></p><h1 dir="ltr" style="line-height:1.38;margin-top:20pt;margin-bottom:6pt"><span style="font-size:14pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Next directions</span></h1><p dir="ltr" style="line-height:1.656;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Our next milestone has two main high level goals: detailing a systematic approach to driving policy effectiveness; and exploring in depth the type of features and training algorithms most appropriate for compiler problems, or at least problems like inlining. For the latter, we expect embedding more of the call graph structure to play an important role, as well as, potentially, delegating the problem space traversal to the ML model.</span></p><p dir="ltr" style="line-height:1.656;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">We plan to include inlining for speed as part of our work on these goals.</span></p><h1 dir="ltr" style="line-height:1.38;margin-top:20pt;margin-bottom:6pt"><span style="font-size:14pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Appendix</span></h1><h2 dir="ltr" style="line-height:1.38;margin-top:18pt;margin-bottom:6pt"><span style="font-size:12pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Training - High Level</span></h2><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">We use Reinforcement Learning (RL) to train the Inline-for-size model. At a high level, it is composed of 3 parts: training data collection, model training, and iterative data collection/model training. We use TensorFlow as our ML framework.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Related, we also needed to learn a separate model to evaluate the native size of a function, given its IR, in order to calculate a more precise reward for the reinforcement learning algorithm (“IR2Native”). We evaluated ‘just counting IR’ and TargetTransformInfo, but they appeared to provide too noisy of a signal for the reward, insofar as the RL training algorithm for the inlining model was concerned. This model is only used during training.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">RL - Training data collection:</span><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"> the training data we need to feed into a reinforcement learning algorithm are sequences consisting of: state of the problem (i.e. features); action (inline/not inline), and reward (native size shrinkage after inline/not inline, using ir2native). To collect the sequences, we hook the logging infrastructure into LLVM Inliner that is able to produce logs after the inline optimization pass. </span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">RL - Model training: </span><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">We use DQN (Deep Q-Network) to train our inlining-for-size ML policy. On a high level, the DQN algorithm trains a neural network to predict the value of different actions --- the DQN policy then chooses to take the action with the highest predicted value. In our scenario, we have two actions: 1) inline; 2) not inline, so the neural network predicts the size reduction of these two actions based on features, and then decides to conduct inlining if the neural network believes doing inlining leads to higher size reduction. After the training finishes, it produces a TensorFlow SavedModel that takes features as input and generates inline decisions (whether to inline or not) as output.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">The choice of the features and reward are essential in model training. The features are chosen with the consideration of being helpful in making the decision --- the input to the cost model is a good starting point. Ideally, the reward in the inline-for-size problem is the native size shrinkage after inline/not inline. It is difficult to obtain precisely, so we use the estimate as an alternative. This means that, for training, we also need a model (not necessarily machine learned) for estimating rewards. </span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">RL - Iterative data collection/model training:</span><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"> Reinforcement learning is ideally an iterative model/policy improvement process that: 1) the trained model is deployed to the field to collect new data; 2) newly collected data are used to update the model. Thus, we need to do iterative data collection/model training. To facilitate data collection (avoid complex build dependencies and time spent before/after the pass being trained), we isolate out IR modules captured right before the optimization we are interested in, and iterate on them with opt. We bootstrap the training from the current heuristic, and stop the process once we are happy with the outcome.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">IR2Native</span><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">: We collect IR features (different from the ones used for inlining) for each function at the end of inlining, right before we perform function simplification passes, and right after. This means we have two IR ‘shapes’ of the same function, and we know no further inlinings will be performed, so whatever changes happen are based on that IR. We then extract the native size at the end of compilation. Together, this data forms two records per function that can be used in supervised learning - the features are those extracted from IR, and the label is the native size. Training IR2Native happens independently from the training of the inliner model.</span></p><h2 dir="ltr" style="line-height:1.38;margin-top:18pt;margin-bottom:6pt"><span style="font-size:12pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Training support for the current scope</span></h2><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">The initial change includes the logging mechanism, an ir2native model trained for x86-64, and the means to rapidly iterate over development ML models. For the components that will be included in subsequent changes, the rest of this section describes the mechanisms we employed. These components are detailed further below.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">To build LLVM with the ML policy in ‘Dev’ mode, we need the </span><a href="https://www.tensorflow.org/install/lang_c" style="text-decoration:none" target="_blank"><span style="font-size:11pt;font-family:Arial;color:rgb(17,85,204);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:underline;vertical-align:baseline;white-space:pre-wrap">tensorflow C API library</span></a><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">. We then configure the build:</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">cmake .. -DLLVM_USE_ML_POLICY=Dev \</span></p><p dir="ltr" style="line-height:1.2;text-indent:36pt;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">-DLLVM_TF_C_LIB=<</span><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:italic;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">path to unarchived package</span><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">> \</span></p><p dir="ltr" style="line-height:1.2;text-indent:36pt;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">{-DCMAKE_INSTALL_RPATH_USE_LINK_PATH=True, to copy tensorflow shared library over, if it’s not on LD_LIBRARY_PATH}</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:3pt"><b style="font-weight:normal"><br></b></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">To extract IR right before inlining, we hacked on top of the ThinLTO infrastructure, interrupting its pre-link pipeline right before inlining. This means clang produces binary IR files captured at that stage. We then built a large target, obtaining a corpus of ~25K modules. We intend to provide a clean mechanism in a subsequent change.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">To obtain features/labels for training this “IR to Native Size” model, we had to make some changes to the AsmPrinter (to get real native sizes) and llvm-readobj, as well as add an analysis pass for extracting the IR features for this model. We plan on upstreaming these changes subsequently.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Finally, the infrastructure driving the policy training is currently built on proprietary APIs, as it benefits from a distributed computing infrastructure. We are currently evaluating options for open sourcing it. In the meantime, we are presenting the high level implementation details.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">To train a new model, the infrastructure performs 2 steps: extracting the logs, and using them in a training algorithm.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Log extraction is highly parallelizable: for each IR module in the training corpus, we need to run opt once (or a few times, when we explore improvements). Specifically, each run is this invocation:</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">opt -passes=scc-oz-module-inliner -ml-inliner-ir2native-model=<path to ir2native> -training-log=<path to training log output> -enable-ml-inliner -mandatory-inlinings-first -o <output> <module.o></span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Then collect the logs, and pass them to the next step.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:3pt"><b style="font-weight:normal"><br></b></p><h1 dir="ltr" style="line-height:1.38;margin-top:20pt;margin-bottom:6pt"><span style="font-size:14pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Experimental results</span></h1><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Experimental results are available as follows:</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:3pt"></p><ul style="margin-top:0px;margin-bottom:0px"><li dir="ltr" style="list-style-type:disc;font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vQv0bAsUlgnG114zMYy_zKR6x-lTjcXVNt7VEeSwq2-pDr5oTxdASCscPRRg6L7iYLu2AVJ44G2xEkp/pubhtml?gid=1870752756&single=true" style="text-decoration:none" target="_blank"><span style="font-size:11pt;font-family:Arial;color:rgb(17,85,204);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:underline;vertical-align:baseline;white-space:pre-wrap">SPEC2006</span></a><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"> binary sizes (-Oz) and ‘test run’ scores.</span></p></li><li dir="ltr" style="list-style-type:disc;font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:10pt"><a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vQv0bAsUlgnG114zMYy_zKR6x-lTjcXVNt7VEeSwq2-pDr5oTxdASCscPRRg6L7iYLu2AVJ44G2xEkp/pubhtml?gid=935959003&single=true" style="text-decoration:none" target="_blank"><span style="font-size:11pt;font-family:Arial;color:rgb(17,85,204);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:underline;vertical-align:baseline;white-space:pre-wrap">Size report</span></a><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"> from some internal benchmarks and binaries, including opt and clang</span></p></li></ul></span></div>