[PATCH] D81515: [llvm] Release-mode ML InlineAdvisor
    Mircea Trofin via Phabricator via llvm-commits 
    llvm-commits at lists.llvm.org
       
    Thu Oct 29 08:38:07 PDT 2020
    
    
  
mtrofin added a comment.
In D81515#2362125 <https://reviews.llvm.org/D81515#2362125>, @AmirJamez wrote:
> In D81515#2357592 <https://reviews.llvm.org/D81515#2357592>, @mtrofin wrote:
>
>> In D81515#2349037 <https://reviews.llvm.org/D81515#2349037>, @AmirJamez wrote:
>>
>>> In D81515#2345894 <https://reviews.llvm.org/D81515#2345894>, @gjain wrote:
>>>
>>>> In D81515#2344814 <https://reviews.llvm.org/D81515#2344814>, @mtrofin wrote:
>>>>
>>>>> In D81515#2344805 <https://reviews.llvm.org/D81515#2344805>, @AmirJamez wrote:
>>>>>
>>>>>> Would you provide scripts to load the model and see the layers?
>>>>>
>>>>> Re. second question, visualization - this is a question for Yundi, Gaurav, or Eugene (they are the ML experts). I'll venture "tensorboard" as an answer, but I'll make sure they give the authoritative one in a moment.
>>>>
>>>> You should be able to use tensorboard but you need to first import the model into tensorboard with https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/import_pb_to_tensorboard.py. Something like `python import_pb_to_tensorboard.py --model_dir=llvm/lib/Analysis/models/inliner/ --log_dir=/tmp/inliner` should work. Then you'll be able to run tensorboard on the log_dir.
>>>>
>>>> Here's a hosted visualization from tensorboard for your convenience: https://tensorboard.dev/experiment/C45o0HjZTPGRSqpOrdkbeg/#graphs
>>>
>>> Thanks.
>>>
>>> (1) May I ask what was the reason behind using a `tf-nighlty` rather than a `tensoflow` release?
>>
>> Historic reason - at the time we started upstreaming the work, the necessary changes to the pip package were not in the release package yet.
>>
>>> (2) tf.nighlty mentioned in https://github.com/google/ml-compiler-opt/blob/master/buildbot/buildbot_init.sh#L119 is no longer available in https://pypi.org/project/tf-nightly/#history :)
>>
>> Thanks for pointing it out - updated the script; one of the build bots was also having issues for this reason, must have been a recent change (or the bots weren't rebooted in a while)
>>
>>> (3) I can confirm that I was able to generate logs and subsequently visualize the model with `tensorboard 2.3.0` and `tensorflow release 2.2.0` instead. Also, in pursuit of installing packages, I ran into:
>>>
>>>   tensorboard duplicate plugins for name projector
>>>
>>> which it turned out to be a common issue for tensorboard when there are multiple packages installed, as a result of trying tf.nightly with release. Removing duplicate tensorboard fixed the issue.
>>
>> To confirm, now that we're using the release 2.3.0 tensorflow pip package, this shouldn't be an issue anymore, correct?
>
> Yes. I confirm using  `TF.2.3.0` and `Tensorboard 2.3.0`;    `pip3 install tensorflow==2.3 --user`  did  the job.
>
>>> (4) Will you also release training scripts for brewing `ir2native`  model as well here: https://github.com/google/ml-compiler-opt
>>
>> IR2Native is used for RL training algorithms where we want partial rewards. That's what we initially did, but then we got better characteristics with training algorithms using just final reward (==the .text size in the native object). We abandoned for the short term the partial rewards training. We suspect it will start making sense again when we incorporate more global context than we currently do (currently, the global context is really thin - node/edge counts, uses, and a measure of the initial DAG position). So this is a long way of saying: we should probably yank out IR2Native right now, for code simplicity, but didn't get around to doing it.
>
> I see. So there are two questions:
>
> (Q1) Could you provide a definition for an IR2native final/optimal partial rewards ? I'd assume it was the final iteration of  model weights when the training was stopped, however, what was the stop condition here?
IR2Native was trained through supervised learning: we captured features after last inlining, then also captured final native size of that function (when asm printing), as label.
> (Q2) To make sense of it, let consider: 
> (2-1) Training Phase:
>
> - **If models are trained together in the same pipeline**: So that means you trained these two (IR2Native and RL) together in the same pipeline, meaning that when you feed IR2Native the training data, the partial rewards are fed into the RL model. If that's the case, it would be tricky as the partial rewards changes each iteration and depending on the input data and gradually converge to a more accurate values (lower loss function) and meanwhile you kept feeding these, //inaccurate values//, to the RL model to get trained. I guess as long as you had a unified strategy to deal with the loss functions, this method should be tricky.
> - **If IR2Native was trained first**: Based on your reply and that you mentioned you fixed the buckets with their final partial rewards, I assume this was the method you used, meaning that you trained IR2Native and stopped the training at a certain iteration perhaps with a low loss function value? or other criteria.  At this point, you use the final buckets of IR2Native to train RL. So in a way IR2Native's inference is used to train RL. Is that a correct assumption?
>
> (2-2) Inference Phase: 
> So at deployment and when an LLVM user passes `opt -passes=scc-oz-module-inliner -enable-ml-inliner=release -S`, `callers()'s` IR2Native features are collected and one bucket is chosen as partial reward which is then fed into RL to decide whether or not to inline a callee() ?
IR2Native was trained completely separately: at a point, we captured the feature|label tuples from a corpus. Then we did supervised learning on that dataset, and obtained the IR2Native model.
After that, we only used the IR2Native model in inference mode any time we wanted to train the the inliner model. The IR used for the training sessions was different (same overall codebase, but unrelated points in time). We didn't retrain IR2Native before training the inliner either.
> Thanks,
>
> - Amir
Repository:
  rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81515/new/
https://reviews.llvm.org/D81515
    
    
More information about the llvm-commits
mailing list