[LLVMdev] [RFC] Iterrative compilation framework for Clang/LLVM
Radovan Obradovic
Radovan.Obradovic at imgtec.com
Thu Dec 19 04:21:16 PST 2013
Hal,
Here is a slightly modified iterative compilation patch that reduces amount
of boilerplate code at the points where the decisions are made.
Radovan
> ________________________________________
> From: Radovan Obradovic
> Sent: Tuesday, December 17, 2013 6:33 PM
> To: Hal Finkel
> Cc: llvmdev at cs.uiuc.edu; chandlerc; Andrew Trick; Zoran Jovanovic; Petar Jovanovic
> Subject: RE: [LLVMdev] [RFC] Iterrative compilation framework for Clang/LLVM
>
> Hal,
>
> Thank you for finding interest in our work!
> Please find my answers inlined below.
>
> >
> > ________________________________________
> > From: Hal Finkel [hfinkel at anl.gov]
> > Sent: Monday, December 16, 2013 4:26 PM
> > To: Radovan Obradovic
> > Cc: llvmdev at cs.uiuc.edu; chandlerc; Andrew Trick
> > Subject: Re: [LLVMdev] [RFC] Iterrative compilation framework for Clang/LLVM
> >
> > Radovan,
> >
> > Thanks for posting this! I would really like to have this kind of
> > functionality available. There are a lot of things to think through
> > here; comments inline...
> >
> > ----- Original Message -----
> > > From: "Radovan Obradovic" <Radovan.Obradovic at imgtec.com>
> > > To: llvmdev at cs.uiuc.edu
> > > Sent: Monday, December 16, 2013 11:31:21 AM
> > > Subject: [LLVMdev] [RFC] Iterrative compilation framework for Clang/LLVM
> > >
> > >
> > >
> > >
> > > This is a first step towards to full iterative compilation framework
> > > for
> > > Clang/LLVM. The attached patch is work in progress and the idea of
> > > sending
> > > it to the list earlier rather than later is to open up a discussion
> > > on
> > > iterative compilation in Clang/LLVM. All comments are welcome,
> > > especially
> > > those related to integration of the framework into Clang/LLVM.
> >
> > I'm glad you're doing this, and I hope that you'll be willing to
> > stick though what might be a long design discussion :)
>
> Our belief is that this approach can be beneficial, especially
> for non-orthogonal architectures (e.g. microMIPS), thus we plan
> to continue our work on this.
>
> >
> > >
> > > Current compilers have pipeline structure in which data from one
> > > phase is
> > > passed to the following one. But these compilation phases can
> > > interfere in
> > > unpredictable way so compilers use conservative heuristics on which
> > > they
> > > make decisions during the compilation process. Because of
> > > unpredictability
> > > of these interferences, compilers sometimes make decisions which can
> > > degrade
> > > code quality. Iterative compilation could partially resolve these
> > > problems
> > > by allowing compiler to explore some alternative decisions, generate
> > > code
> > > for these new decision paths and take the generated code that is the
> > > best
> > > for a particular criteria eventually.
> > >
> > > This approach is not new in the literature, we can mention
> > > Milepost/CTuning
> > > [http://ctuning.org] among other projects. In this approach, in
> > > comparison
> > > to Milepost/CTuning, the emphasis is on much finer granularity by
> > > changing
> > > arbitrary compiler decisions, and not only those based on values of
> > > command
> > > line options, at any point in translation process.
> > >
> > > This framework implements iterative approach in which in every
> > > iteration
> > > slightly different code is generated. At the end of compilation
> > > process the
> > > best emitted code is chosen.
> > >
> > > All points at which relevant decisions are made should be identified.
> > > Access to framework data structure (DecisionTree) should be provided
> > > at
> > > these points. Based on the data from DecisionTree, obtained by
> > > calling
> > > getDecision member function, at these points decisions are made
> > > whether to
> > > execute default actions (as compiler would do without this framework)
> > > or to
> > > follow alternative paths.
> > >
> > > Let us consider Inliner example:
> > >
> > > Original code:
> > >
> > > if (!IC) {
> > > DEBUG(dbgs() << " NOT Inlining: cost=" << IC.getCost()
> > > << ", thres=" << (IC.getCostDelta() + IC.getCost())
> > > << ", Call: " << *CS.getInstruction() << "\n");
> > > return false;
> > > }
> > >
> > > Code after instrumentalization:
> > >
> > > bool doInlinging = (bool)IC;
> > > DecisionTreeProxy *decisionTreeProxy;
> > > decisionTreeProxy =
> > > moduleDecisionTreeProxies->getCurrFnDecisionTreeProxy();
> > > int t = MaxDtPriority - abs(IC.getCostDelta());
> > > bool decision = false;
> > > if (t >= 0)
> > > decision = decisionTreeProxy->getDecision(call);
> > > if (decision)
> > > doInlinging = !doInlinging;
> > >
> > > if (!doInlinging) {
> > > DEBUG(dbgs() << " NOT Inlining: cost=" << IC.getCost()
> > > << ", thres=" << (IC.getCostDelta() + IC.getCost())
> > > << ", Call: " << *CS.getInstruction() << "\n");
> > > return false;
> > > }
> >
> > Talking a quick look over the patch, this seems to introduce a lot
> > of boilerplate code. Can you think of a design that simplifies this?
>
> It is necessary to access decision tree at every place where you
> want to call getDecision function and priority parameter must be
> calculated which is specific to every such place. It is only a
> few lines of code but we will consider how to improve this.
> All suggestions are welcomed.
>
> >
> > >
> > > Function getDecision returns true or false. If false is returned than
> > > this
> > > means to take default path and true means to take alternative one.
> > >
> > > Compilation process can be represented by sequence of binary values
> > > where
> > > every value determines decision made at some point. Value FALSE means
> > > to
> > > follow normal compilation process as original compiler would do
> > > without
> > > presence of iterative compilation framework. TRUE value means to take
> > > alternative path or to make opposite decision at that point. This
> > > naturally
> > > forms tree like structure which we named Decision Tree.
> > > Traversing the tree from the root to the leaf of the tree provides
> > > all the
> > > decisions made during one iteration in compilation process.
> > >
> > > At the end of each iteration, quality of generated code is estimated
> > > by
> > > executing newly introduced target dependent pass. Based on results
> > > path for
> > > the following iteration is calculated. At the moment, this has been
> > > proved
> > > for MIPS only and it is based on code size.
> >
> > I think that, in general, we ought to be able to get an overall
> > latency estimate from the instruction scheduler.
>
> Criteria to optimize for speed should be added as well. Do you
> think that latency info form scheduler can be used to estimate
> execution speed of generated code? Can we guess somehow the execution
> frequency of basic blocks?
>
> >
> > >
> > > Initially iterative compilation framework was implemented in CodeGen
> > > phase
> > > but large number of code transformations was already performed in
> > > earlier
> > > phases. This reduces possibilities to improve quality of generated
> > > code.
> > > Thus iterative compilation framework is applied to Clang driver where
> > > it can
> > > influence every single compiler phase. This relocation resulted in
> > > some
> > > implementation issues mostly related to the fact that driver and the
> > > compiler itself (-cc1) are located in different processes. This is
> > > resolved
> > > by introducing DecisionTreeProxy class that implements communication
> > > between
> > > these two processes through temporary files.
> >
> > I think that this can be solved by introducing some API for the
> > necessary data flow. There is obviously already a binding here.
>
> This probably needs to be fixed. All suggestions are welcomed.
>
> >
> > >
> > > At this point mechanism is applied to two optimizations: Inliner and
> > > Loop
> > > Invariant Code Motion. Supported platform is MIPS32r2.
> > >
> > > Despite small number of instrumented points we got improvements on
> > > some test
> > > files from CSiBE benchmark. Here are the promising results:
> > >
> > > jpeg-6b/jidctflt.c -10.2%
> > > OpenTCP-1.0.4/system.c -6.7%
> > > libmspack/test/chmd_md5.c -5.8%
> > > flex-2.5.31/regex.c -4.8%
> >
> > Very interesting. Out of curiosity, any slowdowns?
>
> jidctflt.c is optimized due to the fact that LICM moves some
> constant definitions out of the loop and increases the register
> pressure so some spills occur. So optimized version, in this
> case, should be even faster. But slowdowns are possible and we
> use option -Oz so we are telling the compiler that code size
> is the priority.
>
> >
> > >
> > > We plan to apply this framework to more compiler phases and to some
> > > other
> > > target architectures. It is expected to give better results on some
> > > non
> > > orthogonal architectures like microMIPS. Another direction is to
> > > apply
> > > machine learning based on some input program features (number of
> > > basic
> > > blocks, register pressure at some point etc.) and it can be used for
> > > selecting compilation path for the next iteration or estimating the
> > > best
> > > path for compiling program without using iterative approach (with
> > > only one
> > > iteration).
> > >
> > >
> > > o
> > > / \
> > > / \
> > > o \
> > > / \
> > > / \
> > > o o
> > > / \ /
> > > / \ /
> > > L1 o o
> > > / /
> > > / /
> > > L2 L3
> > >
> > > Figure 1: Decision Tree Example - Symbol 'o' represent decision
> > > points. If
> > > getDecision function returns FALSE the left branch is taken (default
> > > path),
> > > otherwise the right branch is taken.
> > > 'L1' compilation iteration can be represented as a string
> > > (FALSE,FALSE,FALSE), 'L2' as (FALSE,FALSE,TRUE,FALSE) and 'L3'
> > > as (TRUE, FALSE, FALSE).
> > >
> > >
> > > Clearly, we see huge potential in this direction. At the same time,
> > > we are
> > > aware this would cause compiler to be slower which everyone is
> > > against, but
> > > this penalty can be taken when we are in need for most optimized
> > > code.
> >
> > What is the overhead of the current implementation? I suspect that,
> > in order to control the large number of decision points from large
> > regions, we will want some way to limit fine-grained decisions in
> > the tree in large regions (and only including coarse ones, like the
> > inlining of relatively large functions). In small blocks, on the
> > other hand, we might even include instruction scheduling decisions
> > (for in-order cores).
>
> In the current implementation one full compilation per
> decision is needed. Plan is that, for CodeGen pass, do once all the
> passes of CodeGen per decision (to reuse bitcode). Machine learning should
> considerably reduce the need for the large number of iterations.
>
> >
> > I'd been thinking of experimenting with some iterative scheduling
> > for my in-order PPC cores (for small blocks in loops), and
> > integrating it into something like this might be even better.
> >
> > >
> > > What does everyone think?
> > > Could this approach be considered as a valid mode for Clang/LLVM?
> > >
> > > Please use this post and the attached patch (it can be applied to 3.3
> > > release - clang should be inside llvm/tools folder) as a reference
> > > and
> > > not the patch for submission.
> >
> > I see in the patch some (seemingly small) changes to the pass manager
> > setup. Can you describe exactly what you've done. Chandler is
> > rewriting the pass infrastructure, and I'd like to understand how
> > what you've done will or won't fit with the new setup.
>
> Pointer to decision tree proxy has to be passed to every place
> where getDecision is called and to the pass that calculates function
> fitness. So pointer to it is added in Pass and PassManagerBase classes.
> We will provide more detail explanation later.
>
> >
> > Thanks again,
> > Hal
> >
> > >
> > > Example command line:
> > > > clang -target mipsel-unknown-linux -Oz -S -fiterative-comp=3
> > > > example.c
> > >
> > > Thank you. Looking forward to hearing your comments on this.
> > >
> > >
> > >
> > > _______________________________________________
> > > LLVM Developers mailing list
> > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> > >
> >
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ic.patch
Type: text/x-patch
Size: 58658 bytes
Desc: ic.patch
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131219/0a1278aa/attachment.bin>
More information about the llvm-dev
mailing list