[llvm-dev] RFC: a practical mechanism for applying Machine Learning for optimization policies in LLVM

Thu Apr 9 10:25:19 PDT 2020

On Thu, Apr 9, 2020 at 10:01 AM Mircea Trofin <mtrofin at google.com> wrote:

> Sorry, I wasn't aware of that.
>
> I can make the google doc view-only, keeping the current comments.
>

Copy the discussion in the doc here ? :)

> I'll wait a bit (few hrs) to see if there's any pushback to that.
>
> On Thu, Apr 9, 2020 at 9:57 AM Xinliang David Li <xinliangli at gmail.com>
> wrote:
>
>> One suggestion : should we consolidate the discussion into the main
>> thread? I know some folks are not willing to comment in Google docs.
>>
>> David
>>
>> On Wed, Apr 8, 2020 at 7:06 PM Mircea Trofin via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> +Yundi Qian <yundi at google.com> +Eugene Brevdo <ebrevdo at google.com> ,
>>> our team members from the ML side.
>>>
>>> To avoid formatting issues, here is a link to the RFC
>>> <https://docs.google.com/document/d/1BoSGQlmgAh-yUZMn4sCDoWuY6KWed2tV58P4_472mDE/edit?usp=sharing>,
>>> open to comments.
>>>
>>> Thanks!
>>>
>>> On Wed, Apr 8, 2020 at 2:34 PM Mircea Trofin <mtrofin at google.com> wrote:
>>>
>>>> Unfortunately I needed to update the links. Here they are, hopefully
>>>> these work correctly.
>>>>
>>>> SPEC2006
>>>> <https://docs.google.com/spreadsheets/d/e/2PACX-1vQNAcTDfyQvh6Jq7IRdCvK_fuluUFrzrsGL_75Ile29hX3caBSfT6_jHulxeCJ5MXIHp5SB--A_goEi/pubhtml?gid=987260531&single=true>
>>>>
>>>> Internal benchmarks, clang, opt
>>>> <https://docs.google.com/spreadsheets/d/e/2PACX-1vQNAcTDfyQvh6Jq7IRdCvK_fuluUFrzrsGL_75Ile29hX3caBSfT6_jHulxeCJ5MXIHp5SB--A_goEi/pubhtml?gid=0&single=true>
>>>>
>>>> Thanks!
>>>>
>>>> On Wed, Apr 8, 2020 at 2:23 PM Mircea Trofin <mtrofin at google.com>
>>>> wrote:
>>>>
>>>>> It turns out it's me, sorry. Let me see how I can sort this out. In
>>>>> the meantime, here is the csv:
>>>>>
>>>>> SPEC2006 data:
>>>>>
>>>>> binary,base -Oz size,ML -Oz size,ML size shrink by,,perf: base -Oz
>>>>> scores,perf: ML -Oz scores,ML improvement by
>>>>> 400.perlbench,2054200,2086776,-1.59%,,2.9,2.9,0.00%
>>>>> 401.bzip2,1129976,1095544,3.05%,,6.4,6.2,-3.13%
>>>>> 403.gcc,4078488,4130840,-1.28%,,11.6,11.7,0.86%
>>>>> 429.mcf,1089368,1054552,3.20%,,2.3,2.4,4.35%
>>>>> 433.milc,1161336,1129592,2.73%,,0.5,0.5,0.00%
>>>>> 444.namd,1324824,1290968,2.56%,,4.4,4.3,-2.27%
>>>>> 445.gobmk,5003096,4992472,0.21%,,3.9,4.1,5.13%
>>>>> 447.dealII,1003376,975024,2.83%,,162.1,197.8,22.02%
>>>>> 450.soplex,1359416,1326008,2.46%,,0.1,0.1,0.00%
>>>>> 453.povray,1921432,1952280,-1.61%,,32.9,32.7,-0.61%
>>>>> 456.hmmer,1210744,1184632,2.16%,,1.5,1.5,0.00%
>>>>> 458.sjeng,1185976,1155320,2.58%,,812.9,840.2,3.36%
>>>>> 462.libquantum,1101144,1066712,3.13%,,12.3,12.2,-0.81%
>>>>> 464.h264ref,1557176,1593272,-2.32%,,0.3,0.3,0.00%
>>>>> 470.lbm,1091352,1056664,3.18%,,2.3,2.5,8.70%
>>>>> 471.omnetpp,1045496,1046664,-0.11%,,27.9,28.4,1.79%
>>>>> 473.astar,1106680,1071992,3.13%,,2.4,2.6,8.33%
>>>>> 482.sphinx3,1216600,1182680,2.79%,,16.5,16.3,-1.21%
>>>>> 483.xalancbmk,4666936,4669112,-0.05%,,9.7,9.6,-1.03%
>>>>> ,,,,,,,
>>>>> SIZE SUM,34307616,34061104,0.72%,TOTAL SCORES,5.7,5.8,1.75%
>>>>>
>>>>>
>>>>> various benchmarks:
>>>>>
>>>>> binary,base -Oz size,ML -Oz size,ML shrink over base
>>>>> llvm:opt,66318624,62124256,6.32%
>>>>> math_1,6181552,5884144,4.81%
>>>>> event_management_1,4998728,4802696,3.92%
>>>>> llvm:lcalsBRaw,1270168,1222168,3.78%
>>>>> llvm:lcalsBLambda,1270232,1222232,3.78%
>>>>> llvm:lcalsARaw,1276248,1228248,3.76%
>>>>> llvm:lcalsALambda,1276440,1228504,3.76%
>>>>> llvm:lcalsCRaw,1278936,1231000,3.75%
>>>>> llvm:lcalsCLambda,1279064,1231256,3.74%
>>>>> llvm:Blur,1236888,1191192,3.69%
>>>>> image_processing_diffusion,1236120,1190488,3.69%
>>>>> llvm:Interpolation,1237208,1191576,3.69%
>>>>> llvm:harris,1237208,1191768,3.67%
>>>>> llvm:Dither,1238232,1192792,3.67%
>>>>> event_management_2,4741096,4568584,3.64%
>>>>> hashing_1,5189360,5004176,3.57%
>>>>> compression_2,5329392,5140496,3.54%
>>>>> math_2,6072336,5858992,3.51%
>>>>> crypto_1,4721576,4556008,3.51%
>>>>> math_4,27720208,26755568,3.48%
>>>>> math_3,27732496,26767920,3.48%
>>>>> infra_1,30673232,29630464,3.40%
>>>>> hashing_2,5834544,5637168,3.38%
>>>>> image_processing_entropy,5854960,5657008,3.38%
>>>>> hashing_3,5831088,5634288,3.38%
>>>>> memory_management_1,4957960,4790632,3.37%
>>>>> hasing_4,5816048,5619888,3.37%
>>>>> data_storage_2,5852976,5655792,3.37%
>>>>> compression_1,5971984,5771120,3.36%
>>>>> arena_unittest,6672944,6453584,3.29%
>>>>> data_storage_3,44113232,42710960,3.18%
>>>>> data_storage_1,22858992,22132960,3.18%
>>>>> datastructures_1,6362032,6161264,3.16%
>>>>> infra_2,56828720,55053496,3.12%
>>>>> datastructures_2,164558704,160790320,2.29%
>>>>> llvm:clang,77027416,75672088,1.76%
>>>>> benchmark_infra,20221424,19944432,1.37%
>>>>> search_1,169475360,167645632,1.08%
>>>>> protobuf_3,24071504,23895440,0.73%
>>>>> protobuf_1,24071504,23895504,0.73%
>>>>> protobuf_2,24071504,23895504,0.73%
>>>>> protobuf_4,24071504,23895504,0.73%
>>>>> protobuf_5,24071504,23895504,0.73%
>>>>> protobuf_6,23982800,23809424,0.72%
>>>>>
>>>>> On Wed, Apr 8, 2020 at 2:11 PM Johannes Doerfert <
>>>>> johannesdoerfert at gmail.com> wrote:
>>>>>
>>>>>>
>>>>>> Cool!
>>>>>>
>>>>>> I skipped to the end and tried to access the gdoc and the spreadsheet
>>>>>> but it did tell me I need permissions. Can you make them accessible or
>>>>>> am I the problem?
>>>>>>
>>>>>> Thanks,
>>>>>>    Johannes
>>>>>>
>>>>>> On 4/8/20 4:04 PM, Mircea Trofin via llvm-dev wrote:
>>>>>>  > TL;DR; We can improve compiler optimizations driven by heuristics
>>>>>> by
>>>>>>  > replacing those heuristics with machine-learned policies (ML
>>>>>> models).
>>>>>>  > Policies are trained offline and ship as part of the compiler.
>>>>>> Determinism
>>>>>>  > is maintained because models are fixed when the compiler is
>>>>>> operating in
>>>>>>  > production. Fine-tuning or regressions may be handled by
>>>>>> incorporating the
>>>>>>  > interesting cases in the ML training set, retraining the compiler,
>>>>>> and
>>>>>>  > redeploying it.
>>>>>>  >
>>>>>>  > For a first milestone, we chose inlining for size (-Oz) on X86-64.
>>>>>> We
>>>>>> were
>>>>>>  > able to train an ML model to produce binaries 1.5-6% smaller than
>>>>>> -Oz of
>>>>>>  > tip-of-tree. The trained model appears to generalize well over a
>>>>>> diverse
>>>>>>  > set of binaries. Compile time is increased marginally (under 5%).
>>>>>> The
>>>>>> model
>>>>>>  > also happens to produce slightly better - performing code under
>>>>>> SPEC2006 -
>>>>>>  > total score improvement by 1.75%. As we only wanted to verify
>>>>>> there is no
>>>>>>  > significant regression in SPEC, and given the milestone goals, we
>>>>>> haven’t
>>>>>>  > dug any deeper into the speed results.
>>>>>>  >
>>>>>>  > We see these results as promising, and as a reasonable point for
>>>>>>  > contributing our current work as a build-time opt-in to LLVM to
>>>>>> benefit the
>>>>>>  > community, in the hope of fostering collaboration and learning
>>>>>> from the
>>>>>>  > community’s feedback, as we try to better understand the
>>>>>> trade-offs
>>>>>> such an
>>>>>>  > approach entails, and as we work on expanding the depth and
>>>>>> breadth of
>>>>>>  > applying these techniques to compiler optimization problems.
>>>>>>  >
>>>>>>  > https://reviews.llvm.org/D77752
>>>>>>  > Motivation
>>>>>>  >
>>>>>>  > Optimization problems, such as inlining or register allocation,
>>>>>> are hard:
>>>>>>  > we don’t know of algorithms that are guaranteed to produce the
>>>>>> optimum
>>>>>>  > solution in all cases in a timely fashion. Instead, we use
>>>>>> heuristics:
>>>>>>  > algorithms that we expect to produce some approximation of the
>>>>>> optimum,
>>>>>>  > with some expected degree of generality, in a practical amount of
>>>>>> time.
>>>>>>  >
>>>>>>  > Heuristics have some common characteristics. Taking inlining as a
>>>>>> case
>>>>>>  > study, it traverses the problem space in some way (bottom-up
>>>>>> traversal of
>>>>>>  > the SCC graph), extracts some properties (let’s call them
>>>>>> “features”) of
>>>>>>  > the program being optimized, and combine them with some weights
>>>>>> (“tune”),
>>>>>>  > to produce a cost (InlineCost), which allows for trade-off
>>>>>> analysis. We
>>>>>>  > validate the effectiveness of a heuristic over some accepted set of
>>>>>>  > benchmarks. Over time, we react to regressions or pathological
>>>>>> cases
>>>>>>  > observed in the field, by manually analyzing such cases, figuring
>>>>>> out an
>>>>>>  > enhancement to the heuristic, and then re-validating over that set
>>>>>> of
>>>>>>  > benchmarks (maybe augmented by adding the newly found cases).
>>>>>>  >
>>>>>>  > Because heuristics are code that needs to be maintained, there is
>>>>>> pressure
>>>>>>  > to reduce complexity: adding more features means we need to reason
>>>>>> about
>>>>>>  > the interactions between the old and new features, which scales
>>>>>>  > combinatorially. Re-tuning because of the new features adds a
>>>>>> similar
>>>>>> kind
>>>>>>  > of complexity. Potentially, we miss out on optimization
>>>>>> improvements as a
>>>>>>  > result.
>>>>>>  >
>>>>>>  > Because tuning is manual, there is pressure to keep the number of
>>>>>>  > benchmarks that can be studied in depth to a humanly-manageable
>>>>>> size,
>>>>>> which
>>>>>>  > potentially affects the generality of a heuristic or heuristic
>>>>>> tuning.
>>>>>>  >
>>>>>>  > The main advantage of manual heuristics is arguably their
>>>>>> relatively
>>>>>> lower
>>>>>>  > overhead: no additional dependencies and more transparent to human
>>>>>> analysis
>>>>>>  > and improvement.
>>>>>>  >
>>>>>>  > Machine learning, in particular reinforcement learning, can
>>>>>> address the
>>>>>>  > tensions found in manual heuristics: once features are extracted
>>>>>> from the
>>>>>>  > program, the way they are combined and tuned can easily be scaled
>>>>>> up
>>>>>>  > through automation, improving effectiveness and generality. A major
>>>>>>  > drawback, at least at this point in time, of machine learning, is
>>>>>> that we
>>>>>>  > don’t yet have a fully developed systematic approach for improving
>>>>>> policy
>>>>>>  > effectiveness.
>>>>>>  > High level design
>>>>>>  >
>>>>>>  > We identify two scenarios for a compiler using ML policies:
>>>>>> development and
>>>>>>  > release.
>>>>>>  >
>>>>>>  > The release scenario is equivalent to the regular compilation we
>>>>>> have
>>>>>> today
>>>>>>  > - the only difference is that it uses a pre-trained model (trained
>>>>>> in the
>>>>>>  > development scenario beforehand) to make decisions instead of the
>>>>>>  > heuristics. Determinism is guaranteed since the model in the
>>>>>> release
>>>>>>  > scenario is fixed. We imagine teams wishing to fine tune the
>>>>>> effectiveness
>>>>>>  > of the optimization to their scenarios would train a different
>>>>>> model.
>>>>>>  >
>>>>>>  > The decision previously evaluated using a human-crafted heuristic
>>>>>> is
>>>>>>  > optionally replaced by:
>>>>>>  >
>>>>>>  >    -
>>>>>>  >
>>>>>>  >    a compiler-specific component, extracting features from IR
>>>>>> (i.e. a
>>>>>>  >    vector of values)
>>>>>>  >    -
>>>>>>  >
>>>>>>  >    an evaluation of an ML model using those features, to obtain a
>>>>>> result.
>>>>>>  >    In ML nomenclature, this is referred to using the model for
>>>>>> inference (as
>>>>>>  >    opposed to training it)
>>>>>>  >
>>>>>>  > For example, when we replaced the decision of whether to inline a
>>>>>> callsite,
>>>>>>  > the ML model produces a boolean (inline/don’t inline) based on a
>>>>>> features
>>>>>>  > vector characterizing the call site and some broader module-wide
>>>>>> context.
>>>>>>  >
>>>>>>  > Training/development is more complicated, and happens offline -
>>>>>> akin to
>>>>>>  > how, today, attempts to improve an optimizing pass also happen
>>>>>> offline. A
>>>>>>  > description of the high level design and the specifics we used for
>>>>>> the
>>>>>>  > current scope are given in Appendix.
>>>>>>  > Current Scope
>>>>>>  >
>>>>>>  > The goal of our first milestone was to evaluate end to end an
>>>>>> integration
>>>>>>  > of ML with LLVM, and get a first promising result. To that end, we
>>>>>> chose
>>>>>>  > inlining for size (-Oz) as a stepping stone, as we perceived it to
>>>>>> be
>>>>>> more
>>>>>>  > likely to require a simpler evaluation setup than
>>>>>> performance-oriented
>>>>>>  > optimizations might. At this point, we only train whether a call
>>>>>> site may
>>>>>>  > be inlined or not, leaving the SCC traversal order as-is.
>>>>>>  >
>>>>>>  > We are proposing an initial change demonstrating the inference
>>>>>> mechanism
>>>>>>  > using a pre-trained model, as a build-time opt-in to llvm. The
>>>>>> compiler
>>>>>>  > components needed to perform training are also included in this
>>>>>> first
>>>>>>  > change. Subsequent changes would include more training-related
>>>>>> components.
>>>>>>  >
>>>>>>  > At a high level, the changes we are proposing consist of:
>>>>>>  >
>>>>>>  >    1.
>>>>>>  >
>>>>>>  >    a new module analysis pass, InliningAdvisor. By default, its
>>>>>>  >    implementation does nothing.
>>>>>>  >    2.
>>>>>>  >
>>>>>>  >    minimal hooks into Inliner.cpp.
>>>>>>  >    3.
>>>>>>  >
>>>>>>  >    the implementation of InliningAdvisor, activated when we opt-in
>>>>>> ML. This
>>>>>>  >    is available in Analysis/ML, together with:
>>>>>>  >    1.
>>>>>>  >
>>>>>>  >       Rel/Dev specific ML model handing, also under Analysis/ML
>>>>>>  >       2.
>>>>>>  >
>>>>>>  >       a pre-trained model for inlining for size
>>>>>>  >       (Analysis/ML/models/inlining)
>>>>>>  >       3.
>>>>>>  >
>>>>>>  >       a pre-trained model for predicting native size from IR
>>>>>>  >       (Analysis/ML/models/ir_2_native_x86_64), used in Dev mode
>>>>>> only.
>>>>>>  >       4.
>>>>>>  >
>>>>>>  >    Some refactorings in PassBuilder, to allow opting into running
>>>>>> mandatory
>>>>>>  >    inlining first - some compilation speedup for the ML case,
>>>>>> minimal,
>>>>>>  >    noise-like size effect. Also simplifies testing (these would be
>>>>>> introduced
>>>>>>  >    as a preliminary patch).
>>>>>>  >
>>>>>>  > We discuss ‘rel’ mode here, and ‘dev’ mode in the Appendix, as it
>>>>>> is more
>>>>>>  > involved.
>>>>>>  > Inference Opt-In Mechanism
>>>>>>  >
>>>>>>  > The feature is primarily controlled by the cmake flag
>>>>>>  > LLVM_USE_ML_POLICY={“Rel”|”Dev”}. Each has different dependencies.
>>>>>> The
>>>>>>  > “Rel”ease case requires specifying the location of the pip
>>>>>> tensorflow
>>>>>>  > package (currently, that’s tf_nightly, and it should soon be
>>>>>> available in
>>>>>>  > tensorflow)
>>>>>>  >
>>>>>>  > To opt in the ‘Rel’ case:
>>>>>>  >
>>>>>>  >    1.
>>>>>>  >
>>>>>>  >    install tensorflow pip package
>>>>>>  >
>>>>>>  > pip3 install tf_nightly --user
>>>>>>  >
>>>>>>  >    1.
>>>>>>  >
>>>>>>  >    configure llvm build
>>>>>>  >
>>>>>>  > cmake ../llvm -DLLVM_USE_ML_POLICY=Rel \
>>>>>>  >
>>>>>>  >
>>>>>> -DLLVM_TF_AOT_RUNTIME=~/.local/lib/python3.7/site-packages/tensorflow \
>>>>>>  >
>>>>>>  > {-DLLVM_TF_AOT_COMPILER=<path to saved_model_cli tool, if needed -
>>>>>> it’s
>>>>>>  > usually in the path>}
>>>>>>  >
>>>>>>  >
>>>>>>  >
>>>>>>  >    1.
>>>>>>  >
>>>>>>  >    build llvm as usual.
>>>>>>  >    2.
>>>>>>  >
>>>>>>  >    pass -mllvm -enable-ml-inliner -mllvm
>>>>>> -mandatory-inlinings-first to
>>>>>>  >    clang.
>>>>>>  >
>>>>>>  > Details
>>>>>>  >
>>>>>>  > The ML model is captured as a TensorFlow ‘saved model’. When
>>>>>> building
>>>>>> llvm,
>>>>>>  > we use TensorFlow’s  XLA native compiler (saved_model_cli) to
>>>>>> compile the
>>>>>>  > saved model into a native static library and a header file.
>>>>>> Insofar
>>>>>> as LLVM
>>>>>>  > is concerned, there are minimal additional runtime requirements,
>>>>>> packaged
>>>>>>  > as source within the pip package: C++ wrappers around the compiled
>>>>>> model.
>>>>>>  > These will also be statically linked in the LLVM target. The
>>>>>> compiled
>>>>>> code
>>>>>>  > is otherwise just a complex arithmetical computation, with no
>>>>>> special
>>>>>>  > requirements - it is single threaded and runs natively on the
>>>>>> targeted
>>>>>>  > architecture. Together with the aforementioned runtime
>>>>>> dependencies, it
>>>>>>  > adds ~115KB to the clang binary (0.08% increase)
>>>>>>  >
>>>>>>  > Runtime-wise, we observed a ~10% increase in the time spent in the
>>>>>> inliner,
>>>>>>  > for a large (33MB) binary IR module; inlining typically consumes
>>>>>> ~10-15% of
>>>>>>  > total compilation time, so the overall compile time overhead of the
>>>>>>  > approach is arguably negligible. This cost is almost in entirety
>>>>>>  > attributable to feature extraction.
>>>>>>  >
>>>>>>  > Memory-wise, the precompiled model has a fixed size buffer for its
>>>>>> inputs,
>>>>>>  > and performs a fixed amount of computations, so the memory overhead
>>>>>>  > inherent to our approach is independent from the program being
>>>>>> optimized.
>>>>>>  > Using a small example to avoid effects such as memory use
>>>>>> differences due
>>>>>>  > to different inlinings, we observed an 300KB increase in the
>>>>>> maximum
>>>>>>  > resident size.
>>>>>>  >
>>>>>>  > A table showing effect on -Oz compiled binaries’ size is given in
>>>>>> Appendix.
>>>>>>  > Next directions
>>>>>>  >
>>>>>>  > Our next milestone has two main high level goals: detailing a
>>>>>> systematic
>>>>>>  > approach to driving policy effectiveness; and exploring in depth
>>>>>> the type
>>>>>>  > of features and training algorithms most appropriate for compiler
>>>>>> problems,
>>>>>>  > or at least problems like inlining. For the latter, we expect
>>>>>> embedding
>>>>>>  > more of the call graph structure to play an important role, as
>>>>>> well as,
>>>>>>  > potentially, delegating the problem space traversal to the ML
>>>>>> model.
>>>>>>  >
>>>>>>  > We plan to include inlining for speed as part of our work on these
>>>>>> goals.
>>>>>>  > AppendixTraining - High Level
>>>>>>  >
>>>>>>  > We use Reinforcement Learning (RL) to train the Inline-for-size
>>>>>> model. At a
>>>>>>  > high level, it is composed of 3 parts: training data collection,
>>>>>> model
>>>>>>  > training, and iterative data collection/model training. We use
>>>>>> TensorFlow
>>>>>>  > as our ML framework.
>>>>>>  >
>>>>>>  > Related, we also needed to learn a separate model to evaluate the
>>>>>> native
>>>>>>  > size of a function, given its IR, in order to calculate a more
>>>>>> precise
>>>>>>  > reward for the reinforcement learning algorithm (“IR2Native”). We
>>>>>> evaluated
>>>>>>  > ‘just counting IR’ and TargetTransformInfo, but they appeared to
>>>>>> provide
>>>>>>  > too noisy of a signal for the reward, insofar as the RL training
>>>>>> algorithm
>>>>>>  > for the inlining model was concerned. This model is only used
>>>>>> during
>>>>>>  > training.
>>>>>>  >
>>>>>>  > RL - Training data collection: the training data we need to feed
>>>>>> into a
>>>>>>  > reinforcement learning algorithm are sequences consisting of:
>>>>>> state
>>>>>> of the
>>>>>>  > problem (i.e. features); action (inline/not inline), and reward
>>>>>> (native
>>>>>>  > size shrinkage after inline/not inline, using ir2native). To
>>>>>> collect the
>>>>>>  > sequences, we hook the logging infrastructure into LLVM Inliner
>>>>>> that is
>>>>>>  > able to produce logs after the inline optimization pass.
>>>>>>  >
>>>>>>  > RL - Model training: We use DQN (Deep Q-Network) to train our
>>>>>>  > inlining-for-size ML policy. On a high level, the DQN algorithm
>>>>>> trains a
>>>>>>  > neural network to predict the value of different actions --- the
>>>>>> DQN
>>>>>> policy
>>>>>>  > then chooses to take the action with the highest predicted value.
>>>>>> In our
>>>>>>  > scenario, we have two actions: 1) inline; 2) not inline, so the
>>>>>> neural
>>>>>>  > network predicts the size reduction of these two actions based on
>>>>>> features,
>>>>>>  > and then decides to conduct inlining if the neural network
>>>>>> believes doing
>>>>>>  > inlining leads to higher size reduction. After the training
>>>>>> finishes, it
>>>>>>  > produces a TensorFlow SavedModel that takes features as input and
>>>>>> generates
>>>>>>  > inline decisions (whether to inline or not) as output.
>>>>>>  >
>>>>>>  > The choice of the features and reward are essential in model
>>>>>> training. The
>>>>>>  > features are chosen with the consideration of being helpful in
>>>>>> making the
>>>>>>  > decision --- the input to the cost model is a good starting point.
>>>>>> Ideally,
>>>>>>  > the reward in the inline-for-size problem is the native size
>>>>>> shrinkage
>>>>>>  > after inline/not inline. It is difficult to obtain precisely, so
>>>>>> we
>>>>>> use the
>>>>>>  > estimate as an alternative. This means that, for training, we also
>>>>>> need a
>>>>>>  > model (not necessarily machine learned) for estimating rewards.
>>>>>>  >
>>>>>>  > RL - Iterative data collection/model training: Reinforcement
>>>>>> learning is
>>>>>>  > ideally an iterative model/policy improvement process that: 1) the
>>>>>> trained
>>>>>>  > model is deployed to the field to collect new data; 2) newly
>>>>>> collected data
>>>>>>  > are used to update the model. Thus, we need to do iterative data
>>>>>>  > collection/model training. To facilitate data collection (avoid
>>>>>> complex
>>>>>>  > build dependencies and time spent before/after the pass being
>>>>>> trained), we
>>>>>>  > isolate out IR modules captured right before the optimization we
>>>>>> are
>>>>>>  > interested in, and iterate on them with opt. We bootstrap the
>>>>>> training from
>>>>>>  > the current heuristic, and stop the process once we are happy with
>>>>>> the
>>>>>>  > outcome.
>>>>>>  >
>>>>>>  > IR2Native: We collect IR features (different from the ones used for
>>>>>>  > inlining) for each function at the end of inlining, right before
>>>>>> we
>>>>>> perform
>>>>>>  > function simplification passes, and right after. This means we
>>>>>> have
>>>>>> two IR
>>>>>>  > ‘shapes’ of the same function, and we know no further inlinings
>>>>>> will be
>>>>>>  > performed, so whatever changes happen are based on that IR. We
>>>>>> then
>>>>>> extract
>>>>>>  > the native size at the end of compilation. Together, this data
>>>>>> forms two
>>>>>>  > records per function that can be used in supervised learning - the
>>>>>> features
>>>>>>  > are those extracted from IR, and the label is the native size.
>>>>>> Training
>>>>>>  > IR2Native happens independently from the training of the inliner
>>>>>> model.
>>>>>>  > Training support for the current scope
>>>>>>  >
>>>>>>  > The initial change includes the logging mechanism, an ir2native
>>>>>> model
>>>>>>  > trained for x86-64, and the means to rapidly iterate over
>>>>>> development ML
>>>>>>  > models. For the components that will be included in subsequent
>>>>>> changes, the
>>>>>>  > rest of this section describes the mechanisms we employed. These
>>>>>> components
>>>>>>  > are detailed further below.
>>>>>>  >
>>>>>>  > To build LLVM with the ML policy in ‘Dev’ mode, we need the
>>>>>> tensorflow C
>>>>>>  > API library <https://www.tensorflow.org/install/lang_c>. We then
>>>>>> configure
>>>>>>  > the build:
>>>>>>  >
>>>>>>  > cmake .. -DLLVM_USE_ML_POLICY=Dev \
>>>>>>  >
>>>>>>  > -DLLVM_TF_C_LIB=<path to unarchived package> \
>>>>>>  >
>>>>>>  > {-DCMAKE_INSTALL_RPATH_USE_LINK_PATH=True, to copy tensorflow
>>>>>> shared
>>>>>>  > library over, if it’s not on LD_LIBRARY_PATH}
>>>>>>  >
>>>>>>  >
>>>>>>  > To extract IR right before inlining, we hacked on top of the
>>>>>> ThinLTO
>>>>>>  > infrastructure, interrupting its pre-link pipeline right before
>>>>>> inlining.
>>>>>>  > This means clang produces binary IR files captured at that stage.
>>>>>> We then
>>>>>>  > built a large target, obtaining a corpus of ~25K modules. We
>>>>>> intend to
>>>>>>  > provide a clean mechanism in a subsequent change.
>>>>>>  >
>>>>>>  > To obtain features/labels for training this “IR to Native Size”
>>>>>> model, we
>>>>>>  > had to make some changes to the AsmPrinter (to get real native
>>>>>> sizes) and
>>>>>>  > llvm-readobj, as well as add an analysis pass for extracting the IR
>>>>>>  > features for this model. We plan on upstreaming these changes
>>>>>> subsequently.
>>>>>>  >
>>>>>>  > Finally, the infrastructure driving the policy training is
>>>>>> currently
>>>>>> built
>>>>>>  > on proprietary APIs, as it benefits from a distributed computing
>>>>>>  > infrastructure. We are currently evaluating options for open
>>>>>> sourcing it.
>>>>>>  > In the meantime, we are presenting the high level implementation
>>>>>> details.
>>>>>>  >
>>>>>>  > To train a new model, the infrastructure performs 2 steps:
>>>>>> extracting the
>>>>>>  > logs, and using them in a training algorithm.
>>>>>>  >
>>>>>>  > Log extraction is highly parallelizable: for each IR module in the
>>>>>> training
>>>>>>  > corpus, we need to run opt once (or a few times, when we explore
>>>>>>  > improvements). Specifically, each run is this invocation:
>>>>>>  >
>>>>>>  > opt -passes=scc-oz-module-inliner
>>>>>> -ml-inliner-ir2native-model=<path to
>>>>>>  > ir2native> -training-log=<path to training log output>
>>>>>> -enable-ml-inliner
>>>>>>  > -mandatory-inlinings-first -o <output> <module.o>
>>>>>>  >
>>>>>>  > Then collect the logs, and pass them to the next step.
>>>>>>  >
>>>>>>  >
>>>>>>  > Experimental results
>>>>>>  >
>>>>>>  > Experimental results are available as follows:
>>>>>>  >
>>>>>>  >
>>>>>>  >    -
>>>>>>  >
>>>>>>  >    SPEC2006
>>>>>>  >
>>>>>> <
>>>>>> https://docs.google.com/spreadsheets/d/e/2PACX-1vQv0bAsUlgnG114zMYy_zKR6x-lTjcXVNt7VEeSwq2-pDr5oTxdASCscPRRg6L7iYLu2AVJ44G2xEkp/pubhtml?gid=1870752756&single=true
>>>>>> >
>>>>>>  >    binary sizes (-Oz) and ‘test run’ scores.
>>>>>>  >    -
>>>>>>  >
>>>>>>  >    Size report
>>>>>>  >
>>>>>> <
>>>>>> https://docs.google.com/spreadsheets/d/e/2PACX-1vQv0bAsUlgnG114zMYy_zKR6x-lTjcXVNt7VEeSwq2-pDr5oTxdASCscPRRg6L7iYLu2AVJ44G2xEkp/pubhtml?gid=935959003&single=true
>>>>>> >
>>>>>>  >    from some internal benchmarks and binaries, including opt and
>>>>>> clang
>>>>>>  >
>>>>>>  >
>>>>>>  > _______________________________________________
>>>>>>  > LLVM Developers mailing list
>>>>>>  > llvm-dev at lists.llvm.org
>>>>>>  > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200409/00fb8a55/attachment-0001.html>