[llvm-dev] [RFC] BOLT: A Framework for Binary Analysis, Transformation, and Optimization

Mon Mar 15 21:15:55 PDT 2021

On Fri, Mar 12, 2021 at 11:57 AM Rafael Auler <rafaelauler at fb.com> wrote:

> Chris, the approach of living under /bolt sounds reasonable to me.
>
>
>
> Mehdi and David, the difference of doing things in-tree vs out-of-tree is
> that, currently, BOLT out-of-tree has
>
>   (1) different legal requirements for accepting contributions (external
> contributions require devs to sign a CLA). So I agree with Mehdi that the
> same forks will get broken as we refactor code, but once BOLT is in the
> llvm monorepo, at least they will have the chance to upstream it with
> different legal requirements. If they don’t want to upstream it, that’s
> fine too, but I would like to give them a chance.
>   (2) a different development workflow that is less open than LLVM’s.
> Because we want the input of the community on a refactoring that reflects
> how they want to use the libraries too, it would be more natural for this
> to happen inside in-tree LLVM.
>
>
>
> David, if we try to coordinate this refactoring happening in both repos
> (library part in LLVM while the client part in our separate repo), that
> will be challenging to do because we wouldn’t be able to easily test the
> LLVM’s diffs – a problem we are already facing with upstreaming our changes
> to LLVM without BOLT being there to easily show devs how our changes are
> actually used and tested. Moreover, other contributors who don’t have easy
> access to our github repo will have a hard time working with us in the
> refactor as they wouldn’t be able to do work on the tool (just the open
> library).
>

Hi Rafael, I am not actually proposing an intermediate state where parts of
BOLT lives in LLVM while the client lives in a separate repo. What I meant
is a restructuring step within BOLT before dropping in LLVM.  For instance,
in the bolt's top directory, there are lots of different things --
different driver programs, profile reader/writers, debug info handling,
exception handling code, BOLT IR/core data structures (BB, Loop, Function)
etc, pass managers etc. The Pass directory is also pretty flat.   Some
preliminary reorganization with more tests added can reduce a lot of churns
in the future. WDYT?

thanks,

David

>
>
> Mehdi, your suggestion looks good, I intend to show everyone the monorepo
> snapshot. We are making sure it is ready to be published and that’s why
> I’ve been referring to our snapshot as “imagine our github repo contents
> are under /bolt” because that is pretty much it, but I will present it soon.
>
>
>
>
>
>
>
> *From: *Xinliang David Li <xinliangli at gmail.com>
> *Date: *Thursday, March 11, 2021 at 11:33 PM
> *To: *Chris Lattner <clattner at nondot.org>
> *Cc: *Rafael Auler <rafaelauler at fb.com>, llvm-dev <llvm-dev at lists.llvm.org>,
> Andrey Bokhanko <andreybokhanko at gmail.com>
> *Subject: *Re: [llvm-dev] [RFC] BOLT: A Framework for Binary Analysis,
> Transformation, and Optimization
>
> Dropping Bolt to the top level directory sounds reasonable, but perhaps a
> hybrid approach similar to what is mentioned by Medhi can be applied.
> Basically Bolt first goes through a round of refactoring in github upstream
> first with design that is close to the future structure in LLVM, and then
> drops in as a monolithic piece initially. This will make future
> restructuring much easier. There are other benefits: 1) it is a good
> opportunity to clean up Bolt's internal APIs 2) It is time to beef up
> unittests;  3) it makes code review easier.
>
>
>
> David
>
>
>
> On Thu, Mar 11, 2021 at 10:34 PM Chris Lattner via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> On Mar 11, 2021, at 9:40 PM, Rafael Auler via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
>
> Hi Mehdi and David,
>
>
>
> Indeed, we share similar concerns. We do intend to move functionality of
> BOLT to live as a library, but the timeline is unclear. In fact, most of
> BOLT could live in a library already, it’s just a matter of moving some
> files into separate components. Instead of the files living in
> tools/llvm-bolt, most could just be moved under lib/something, and we
> already have a llvm-bolt.cpp file that instantiates the driver that
> coordinates the binary rewriting process, which is the entry point of BOLT
> as a library. People could already leverage this to use BOLT in different
> ways (for example, I wrote some time ago a different utility that runs the
> driver for two different binaries and compares the two – this was named
> boltdiff later).
>
>
>
> My main reason for committing the project as a whole first, in the same
> way as flang did, though, (as a project merged into the monorepo), is
> because BOLT is already opensource for a while, and it is a 6-year old
> project with about 800 commits and 50K lines of code and we know we have
> people who forked the project and would like to contribute to it. If I
> commit into LLVM a different BOLT (not just rebased), then I (a) break or
> make it hard for any work on top of it from other contributors, (b) lose
> the original history or make it harder to preserve it.  That’s why I was
> going for a more smoother transition. I, as a developer, put value in the
> ability to blame and to understand why things were built a certain way, and
> not bringing BOLT’s history (in the same way as flang did) would mean we
> and the community loses a lot of context on the decisions of the project.
> And I guess that’s also the rationale for a monorepo, to have multiple
> projects merged together.
>
>
>
> Because of that, I initially put bolt under /bolt, following flang’s model
> of merging the history so every developer has the right context. But the
> original location was under llvm/tools.
>
>
>
> As with others, I’m not very aware of the internal architecture of bolt,
> so take this with a grain of salt:
>
>
>
> From what I understand, I have a slight preference for starting this out
> as a /bolt top level “subproject”, because the code currently sounds
> monolithic.  As the implementation logic is refactored into more reusable
> units, those library can be cleanly movable within the monorepo, e.g. under
> the llvm-project/llvm directory if appropriate.
>
>
>
> The advantage of doing this is that nothing in the llvm-project/llvm repo
> can come to depend on the bolt code until and if it gets refactored.  This
> is also how things like LLDB started out (and it would be great for more of
> the reusable libraries in LLDB to be merged into LLVM over time).
>
>
>
> Does anyone have any concerns about this approach?
>
>
>
>
>
>
>
> Unrelatedly, I’d also love to see the llvm repository exploded a bit into
> more top level repos, e.g. splitting support/adt out to their own thing.
> It is also worth considering splitting the MC layer out to its own thing as
> well, LLVM IR and the mid-level optimizer into its own thing, and CodeGen
> and the targets into its own thing.
>
>
>
> The major constraint we need is that we want the dependences between
> top-level subproject to be a strong DAG between the subproject now and
> defensible into the future, and we don’t want minor evolution of the
> codebase to cause libraries to have to be moved around.  The benefit of
> splitting it up is easier to enforce layering, encouraging LLVM developers
> to work across subproject a bit more, and making it easier for subproject
> to depend on slices of “the big llvm directory”.
>
>
>
> -Chris
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210315/2e6b2579/attachment-0001.html>