[llvm-dev] [RFC] BOLT: A Framework for Binary Analysis, Transformation, and Optimization

Tue Mar 16 10:46:51 PDT 2021

I think one thing we can all agree upon is the community wants a good
balance between velocity and quality (ensured by proper reviews). I believe
doing some preliminary restructuring and cleanups can  help not only the
quality, but improves velocity as well.  A good structure serves the
purpose of 'self-documentation' and will greatly help code reviewers (to be
more effective).

thanks,

David

On Tue, Mar 16, 2021 at 4:24 AM Andrey Bokhanko <andreybokhanko at gmail.com>
wrote:

> Let me add my modest +1 vote to committing BOLT as it is, and *then*
> restructuring it as a part of LLVM development process -- with proper
> reviews, etc.
>
> This is how flang and OpenMP runtime had been added to LLVM project.
> This is a sure way to start things going; otherwise we may end up with
> a project preparing for inclusion into LLVM ad infinitum.
>
> Yours,
> Andrey
>
>
>
>
> On Tue, Mar 16, 2021 at 7:16 AM Xinliang David Li <xinliangli at gmail.com>
> wrote:
> >
> >
> >
> > On Fri, Mar 12, 2021 at 11:57 AM Rafael Auler <rafaelauler at fb.com>
> wrote:
> >>
> >> Chris, the approach of living under /bolt sounds reasonable to me.
> >>
> >>
> >>
> >> Mehdi and David, the difference of doing things in-tree vs out-of-tree
> is that, currently, BOLT out-of-tree has
> >>
> >>   (1) different legal requirements for accepting contributions
> (external contributions require devs to sign a CLA). So I agree with Mehdi
> that the same forks will get broken as we refactor code, but once BOLT is
> in the llvm monorepo, at least they will have the chance to upstream it
> with different legal requirements. If they don’t want to upstream it,
> that’s fine too, but I would like to give them a chance.
> >>   (2) a different development workflow that is less open than LLVM’s.
> Because we want the input of the community on a refactoring that reflects
> how they want to use the libraries too, it would be more natural for this
> to happen inside in-tree LLVM.
> >>
> >>
> >>
> >> David, if we try to coordinate this refactoring happening in both repos
> (library part in LLVM while the client part in our separate repo), that
> will be challenging to do because we wouldn’t be able to easily test the
> LLVM’s diffs – a problem we are already facing with upstreaming our changes
> to LLVM without BOLT being there to easily show devs how our changes are
> actually used and tested. Moreover, other contributors who don’t have easy
> access to our github repo will have a hard time working with us in the
> refactor as they wouldn’t be able to do work on the tool (just the open
> library).
> >
> >
> > Hi Rafael, I am not actually proposing an intermediate state where parts
> of BOLT lives in LLVM while the client lives in a separate repo. What I
> meant is a restructuring step within BOLT before dropping in LLVM.  For
> instance, in the bolt's top directory, there are lots of different things
> -- different driver programs, profile reader/writers, debug info handling,
> exception handling code, BOLT IR/core data structures (BB, Loop, Function)
> etc, pass managers etc. The Pass directory is also pretty flat.   Some
> preliminary reorganization with more tests added can reduce a lot of churns
> in the future. WDYT?
> >
> > thanks,
> >
> > David
> >
> >
> >
> >>
> >>
> >>
> >> Mehdi, your suggestion looks good, I intend to show everyone the
> monorepo snapshot. We are making sure it is ready to be published and
> that’s why I’ve been referring to our snapshot as “imagine our github repo
> contents are under /bolt” because that is pretty much it, but I will
> present it soon.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> From: Xinliang David Li <xinliangli at gmail.com>
> >> Date: Thursday, March 11, 2021 at 11:33 PM
> >> To: Chris Lattner <clattner at nondot.org>
> >> Cc: Rafael Auler <rafaelauler at fb.com>, llvm-dev <
> llvm-dev at lists.llvm.org>, Andrey Bokhanko <andreybokhanko at gmail.com>
> >> Subject: Re: [llvm-dev] [RFC] BOLT: A Framework for Binary Analysis,
> Transformation, and Optimization
> >>
> >> Dropping Bolt to the top level directory sounds reasonable, but perhaps
> a hybrid approach similar to what is mentioned by Medhi can be applied.
> Basically Bolt first goes through a round of refactoring in github upstream
> first with design that is close to the future structure in LLVM, and then
> drops in as a monolithic piece initially. This will make future
> restructuring much easier. There are other benefits: 1) it is a good
> opportunity to clean up Bolt's internal APIs 2) It is time to beef up
> unittests;  3) it makes code review easier.
> >>
> >>
> >>
> >> David
> >>
> >>
> >>
> >> On Thu, Mar 11, 2021 at 10:34 PM Chris Lattner via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >>
> >> On Mar 11, 2021, at 9:40 PM, Rafael Auler via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >>
> >>
> >>
> >> Hi Mehdi and David,
> >>
> >>
> >>
> >> Indeed, we share similar concerns. We do intend to move functionality
> of BOLT to live as a library, but the timeline is unclear. In fact, most of
> BOLT could live in a library already, it’s just a matter of moving some
> files into separate components. Instead of the files living in
> tools/llvm-bolt, most could just be moved under lib/something, and we
> already have a llvm-bolt.cpp file that instantiates the driver that
> coordinates the binary rewriting process, which is the entry point of BOLT
> as a library. People could already leverage this to use BOLT in different
> ways (for example, I wrote some time ago a different utility that runs the
> driver for two different binaries and compares the two – this was named
> boltdiff later).
> >>
> >>
> >>
> >> My main reason for committing the project as a whole first, in the same
> way as flang did, though, (as a project merged into the monorepo), is
> because BOLT is already opensource for a while, and it is a 6-year old
> project with about 800 commits and 50K lines of code and we know we have
> people who forked the project and would like to contribute to it. If I
> commit into LLVM a different BOLT (not just rebased), then I (a) break or
> make it hard for any work on top of it from other contributors, (b) lose
> the original history or make it harder to preserve it.  That’s why I was
> going for a more smoother transition. I, as a developer, put value in the
> ability to blame and to understand why things were built a certain way, and
> not bringing BOLT’s history (in the same way as flang did) would mean we
> and the community loses a lot of context on the decisions of the project.
> And I guess that’s also the rationale for a monorepo, to have multiple
> projects merged together.
> >>
> >>
> >>
> >> Because of that, I initially put bolt under /bolt, following flang’s
> model of merging the history so every developer has the right context. But
> the original location was under llvm/tools.
> >>
> >>
> >>
> >> As with others, I’m not very aware of the internal architecture of
> bolt, so take this with a grain of salt:
> >>
> >>
> >>
> >> From what I understand, I have a slight preference for starting this
> out as a /bolt top level “subproject”, because the code currently sounds
> monolithic.  As the implementation logic is refactored into more reusable
> units, those library can be cleanly movable within the monorepo, e.g. under
> the llvm-project/llvm directory if appropriate.
> >>
> >>
> >>
> >> The advantage of doing this is that nothing in the llvm-project/llvm
> repo can come to depend on the bolt code until and if it gets refactored.
> This is also how things like LLDB started out (and it would be great for
> more of the reusable libraries in LLDB to be merged into LLVM over time).
> >>
> >>
> >>
> >> Does anyone have any concerns about this approach?
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> Unrelatedly, I’d also love to see the llvm repository exploded a bit
> into more top level repos, e.g. splitting support/adt out to their own
> thing.  It is also worth considering splitting the MC layer out to its own
> thing as well, LLVM IR and the mid-level optimizer into its own thing, and
> CodeGen and the targets into its own thing.
> >>
> >>
> >>
> >> The major constraint we need is that we want the dependences between
> top-level subproject to be a strong DAG between the subproject now and
> defensible into the future, and we don’t want minor evolution of the
> codebase to cause libraries to have to be moved around.  The benefit of
> splitting it up is easier to enforce layering, encouraging LLVM developers
> to work across subproject a bit more, and making it easier for subproject
> to depend on slices of “the big llvm directory”.
> >>
> >>
> >>
> >> -Chris
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210316/3c711a01/attachment-0001.html>