[PATCH] D24167: Moving to GitHub - Unified Proposal

Mehdi Amini via llvm-commits llvm-commits at lists.llvm.org
Wed Oct 12 10:29:08 PDT 2016


> On Oct 12, 2016, at 10:18 AM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
> 
> 
>> On 2016-Oct-12, at 09:48, Justin Lebar <jlebar at google.com> wrote:
>> 
>> The monorepo proposal will continue to maintain the existing
>> single-project mirrors, and their hashes.
>> 
>> Anyone who needs the hashes to be preserved can get that
>> functionality, by using the single-project mirrors -- just like they
>> could with the multirepo.
> 
> If I need checkouts of the split git mirrors in order to lookup old commits, then I'll have those checked out, and I'll never adopt monorepo.

It depends:
1) Why do you need this.
2) What you mean by “in order to lookup old commit”.

What if we provide `git llvm show <old git hash>` that would work directly with the monorepo?

> Also, I doubt you'll be able to keep those hashes the same.  IIUC, they'll be git-svn mirrors on top of the GitHub SVN bridge... how could they possibly have the same hashes as the git-svn mirrors on top of the original llvm.org repository?
> 
>> What problem are you trying to solve by preserving hashes in the
>> monorepo itself?
> 
> Simple copy/paste from bug tracker to `git log`.  Some bugs have both, and if there's a hash, I always type `git log ha5h`.  It's a weak (not very important) goal, but something I had assumed was going to work.
> 
> Since SVN revisions are canonical, they're always in the bug tracker.  So it's sufficient to keep the git-svn-id around so that `git log --grep '@123456'` still works.

I’d even expect us to keep having a integer rev number and have wrapper script:  `git llvm show r12345`.

Whatever way we implement it (empty padding commit, git notes, etc.), I don’t see an obstacle to this.

> 
>> For my part, I would much prefer to have good, clean, linear history
>> in the monorepo, and as far as I can figure out, that will necessitate
>> changing the hashes.
>> 
>> Perhaps we could associate the old hashes with the new commits via git
>> notes, just as chapuni's monorepo associates svn revision numbers with
>> commits via git notes.  This would allow you to easily look up a
>> commit in the monorepo based on its old hash (it's just a matter of
>> git grep).
> 
> Knowing nothing about git notes, this seems like a nice idea.
> 
> Are git notes efficiently searchable?  (I.e., can they also replace my `git log --grep '@123456'` workflow?  If so, dropping the git-svn-id lines from the commit messages would also be reasonable.)
> 
> ...
> 
> BTW, I don't think we should block publishing the proposal should be blocked on working this out.  It's already late.  As long as we're adding new content, we'll continue to find new problems and new solutions.
> 
>> On Wed, Oct 12, 2016 at 9:41 AM, Duncan Exon Smith <dexonsmith at apple.com> wrote:
>>> I was thinking we'd merge the repos together (as in the second suggestion in
>>> the document).  However maybe that breaks bisection?  Haven't thought it
>>> through, but I guess I assumed they'd be preserved somehow.
>>> 
>>> One way to preserve hashes would be to merge each hash into the monorepo in
>>> commit order.  The monorepo wouldn't have a linear history, but the hashes
>>> would stay valid.   Seems flawed though...
>>> 
>>> (If preserving hashes isn't practical anyway, then preserving revisions is
>>> obvious goodness.  But "hashes not preserved" should go down as a concern
>>> about monorepo.)
>>> 
>>> -- dpnes
>>> 
>>> On Oct 12, 2016, at 07:32, Mehdi Amini <mehdi.amini at apple.com> wrote:
>>> 
>>> How do you preserve hashes in the monorepo?
>>> 
>>> Sent from my iPhone
>>> 
>>> On Oct 12, 2016, at 6:03 AM, Duncan Exon Smith <dexonsmith at apple.com> wrote:
>>> 
>>> I find the idea of rewriting the history really scary.  I'd rather keep the
>>> same git commit hashes as the current read only mirrors somehow (and have
>>> the revision numbers in the git svn ids).  Probably contentious but worth
>>> mentioning?
>>> 
>>> -- dpnes
>>> 
>>> On Oct 12, 2016, at 00:27, Mehdi Amini <mehdi.amini at apple.com> wrote:
>>> 
>>> Hey,
>>> 
>>> Thanks, I’m still working on incorporating your comments.
>>> 
>>> Just wanted to share something: Chandler pointed to me a possibility in the
>>> implementation of the monorepo that is worth spelling out in the document.
>>> 
>>> We can add “padding empty commits” in the history and guarantee that the new
>>> revision number (SVN revision when checking out the GitHub SVN bridge, or
>>> the `git rev-list HEAD —count`) matches exactly the existing SVN revision.
>>> This would allow:
>>> 
>>> 1) To preserve the reference of all the bugzilla entries (fixed in r12345).
>>> 2) For a downstream integrator that rely on the current revision number from
>>> SVN to not being disrupted by overlapping revisions (new ToT rev # would be
>>> lower than current).
>>> 
>>>>>> Mehdi
>>> 
>>> 
>>> On Oct 11, 2016, at 11:19 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com>
>>> wrote:
>>> 
>>> 
>>> 
>>> On 2016-Oct-11, at 08:16, Mehdi AMINI <mehdi.amini at apple.com> wrote:
>>> 
>>> 
>>> mehdi_amini updated this revision to Diff 74258.
>>> 
>>> mehdi_amini added a comment.
>>> 
>>> 
>>> Try another layout: add first a description of the multirepo, then one for
>>> the monorepo, then the interleaved comparison.
>>> 
>>> 
>>> 
>>> https://reviews.llvm.org/D24167
>>> 
>>> 
>>> Files:
>>> 
>>> docs/Proposals/GitHubMove.rst
>>> 
>>> docs/index.rst
>>> 
>>> 
>>> <D24167.74258.patch>
>>> 
>>> 
>>> I like this a lot.  I think it's a good compromise to keep the workflows
>>> 
>>> separate while inlining the rest of the sections into the main variant
>>> 
>>> descriptions.
>>> 
>>> 
>>> A quick overview of the suggestions I've made inline:
>>> 
>>> 
>>> - Structural changes to the new Multirepo/Monorepo sections, adding
>>> subtitles
>>> 
>>> and reordering.
>>> 
>>> - Inline "Living Downstream" (I have specific wording suggestions).
>>> 
>>> - Major edits to the "Concerns" bullets.
>>> 
>>> - "Xrepo Proposal" => "Xrepo Variant" or "Xrepo Sub-proposal".
>>> 
>>> - Minor spelling/grammar changes.
>>> 
>>> 
>>> This one isn't inline, but:
>>> 
>>> 
>>> - s/subproject/sub-project/
>>> 
>>> s/sub-project/subproject/
>>> 
>>> (pick one; I saw instances of both, likely used them inconsistently in my
>>> own
>>> 
>>> edits, and haven't gone back to make sure they're consistent)
>>> 
>>> 
>>> Index: docs/index.rst
>>> 
>>> ===================================================================
>>> 
>>> --- docs/index.rst
>>> 
>>> +++ docs/index.rst
>>> 
>>> @@ -517,6 +517,7 @@
>>> 
>>> IRC, etc).
>>> 
>>> 
>>> :doc:`Proposals/GitHubSubMod`
>>> 
>>> +:doc:`Proposals/GitHubMove`
>>> 
>>> Proposal to move from SVN/Git to GitHub.
>>> 
>>> 
>>> 
>>> Index: docs/Proposals/GitHubMove.rst
>>> 
>>> ===================================================================
>>> 
>>> --- /dev/null
>>> 
>>> +++ docs/Proposals/GitHubMove.rst
>>> 
>>> @@ -0,0 +1,751 @@
>>> 
>>> +==============================
>>> 
>>> +Moving LLVM Projects to GitHub
>>> 
>>> +==============================
>>> 
>>> +
>>> 
>>> +Introduction
>>> 
>>> +============
>>> 
>>> +
>>> 
>>> +This is a proposal to move our current revision control system from our own
>>> 
>>> +hosted Subversion to GitHub. Below are the financial and technical
>>> arguments as
>>> 
>>> +to why we are proposing such a move and how people (and validation
>>> 
>>> +infrastructure) will continue to work with a Git-based LLVM.
>>> 
>>> +
>>> 
>>> +There will be a survey pointing at this document which we'll use to gauge
>>> the
>>> 
>>> +community's reaction and, if we collectively decide to move, the
>>> time-frame. Be
>>> 
>>> +sure to make your view count.
>>> 
>>> +
>>> 
>>> +Additionally, we will discuss this during a BoF at the next US LLVM
>>> Developer
>>> 
>>> +meeting (http://llvm.org/devmtg/2016-11/).
>>> 
>>> +
>>> 
>>> +This proposal is divided into the following parts:
>>> 
>>> +
>>> 
>>> +* Outline of the reasons to move to Git and GitHub
>>> 
>>> +* Description off the options
>>> 
>>> +* What examples of some workflows will look like (compared to currently)
>>> 
>>> +* The proposed migration plan
>>> 
>>> +
>>> 
>>> +What This Proposal is *Not* About
>>> 
>>> +=================================
>>> 
>>> +
>>> 
>>> +Changing the development policy.
>>> 
>>> +
>>> 
>>> +This proposal relates only to moving the hosting of our source-code
>>> repository
>>> 
>>> +from SVN hosted on our own servers to Git hosted on GitHub. We are not
>>> proposing
>>> 
>>> +using GitHub's issue tracker, pull-requests, or code-review.
>>> 
>>> +
>>> 
>>> +Contributers will continue to earn commit access on demand under the
>>> Developer
>>> 
>>> +Policy, except that that a GitHub account will be required instead of SVN
>>> 
>>> +username/password-hash.
>>> 
>>> +
>>> 
>>> +Why Git, and Why GitHub?
>>> 
>>> +========================
>>> 
>>> +
>>> 
>>> +Why Move At All?
>>> 
>>> +----------------
>>> 
>>> +
>>> 
>>> +This discussion began because we currently host host our own Subversion
>>> server
>>> 
>>> 
>>> s/host host/host/
>>> 
>>> 
>>> +and Git mirror in a voluntary basis. The LLVM Foundation sponsors the
>>> server and
>>> 
>>> 
>>> s/in a vol/on a vol/
>>> 
>>> 
>>> +provides limited support, but there is only so much it can do.
>>> 
>>> +
>>> 
>>> +Volunteers are not sysadmins themselves, but compiler engineers that happen
>>> 
>>> +to know a thing or two about hosting servers. We also don't have 24/7
>>> support,
>>> 
>>> +and we sometimes wake up to see that continuous integration is broken
>>> because
>>> 
>>> +the SVN server is either down or unresponsive.
>>> 
>>> +
>>> 
>>> +We should take advantage of one of the services out there (GitHub, GitLab,
>>> 
>>> +BitBucket among others) that offer that same service (24/7 stability, disk
>>> 
>>> 
>>> Should "that same service" be "better service"?  24/7 (at least) sounds
>>> better.
>>> 
>>> 
>>> +space, Git server, code browsing, forking facilities, etc) for free.
>>> 
>>> +
>>> 
>>> +Why Git?
>>> 
>>> +--------
>>> 
>>> +
>>> 
>>> +Many new coders nowadays start with Git, and a lot of people have never
>>> used
>>> 
>>> +SVN, CVS, or anything else. Websites like GitHub have changed the landscape
>>> 
>>> +of open source contributions, reducing the cost of first contribution and
>>> 
>>> +fostering collaboration.
>>> 
>>> +
>>> 
>>> +Git is also the version control many LLVM developers use. Despite the
>>> 
>>> +sources being stored in a SVN server, these developers are already using
>>> Git
>>> 
>>> +through the Git-SVN integration.
>>> 
>>> +
>>> 
>>> +Git allows you to:
>>> 
>>> +
>>> 
>>> +* Commit, squash, merge, and fork locally without touching the remote
>>> server.
>>> 
>>> +* Maintain local branches, enabling multiple threads of development.
>>> 
>>> +* Collaborate on these branches (e.g. through your own fork of llvm on
>>> GitHub).
>>> 
>>> +* Inspect the repository history (blame, log, bisect) without Internet
>>> access.
>>> 
>>> +* Maintain remote forks and branches on Git hosting services and
>>> 
>>> +  integrate back to the main repository.
>>> 
>>> +
>>> 
>>> +In addition, because Git seems to be replacing many OSS projects' version
>>> 
>>> +control systems, there are many tools that are built over Git.
>>> 
>>> +Future tooling may support Git first (if not only).
>>> 
>>> +
>>> 
>>> +Why GitHub?
>>> 
>>> +-----------
>>> 
>>> +
>>> 
>>> +GitHub, like GitLab and BitBucket, provides free code hosting for open
>>> source
>>> 
>>> +projects. Any of these could replace the code-hosting infrastructure that
>>> we
>>> 
>>> +have today.
>>> 
>>> +
>>> 
>>> +These services also have a dedicated team to monitor, migrate, improve and
>>> 
>>> +distribute the contents of the repositories depending on region and load.
>>> 
>>> +
>>> 
>>> +GitHub has one important advantage over GitLab and
>>> 
>>> +BitBucket: it offers read-write **SVN** access to the repository
>>> 
>>> +(https://github.com/blog/626-announcing-svn-support).
>>> 
>>> +This would enable people to continue working post-migration as though our
>>> code
>>> 
>>> +were still canonically in an SVN repository.
>>> 
>>> +
>>> 
>>> +In addition, there are already multiple LLVM mirrors on GitHub, indicating
>>> that
>>> 
>>> +part of our community has already settled there.
>>> 
>>> +
>>> 
>>> +On Managing Revision Numbers with Git
>>> 
>>> +-------------------------------------
>>> 
>>> +
>>> 
>>> +The current SVN repository hosts all the LLVM sub-projects alongside each
>>> other.
>>> 
>>> +A single revision number (e.g. r123456) thus identifies a consistent
>>> version of
>>> 
>>> +all LLVM sub-projects.
>>> 
>>> +
>>> 
>>> +Git does not use sequential integer revision number but instead uses a hash
>>> to
>>> 
>>> +identify each commit. (Linus mentioned that the lack of such revision
>>> number
>>> 
>>> +is "the only real design mistake" in Git [TorvaldRevNum]_.)
>>> 
>>> +
>>> 
>>> +The loss of a sequential integer revision number has been a sticking point
>>> in
>>> 
>>> +past discussions about Git:
>>> 
>>> +
>>> 
>>> +- "The 'branch' I most care about is mainline, and losing the ability to
>>> say
>>> 
>>> +  'fixed in r1234' (with some sort of monotonically increasing number)
>>> would
>>> 
>>> +  be a tragic loss." [LattnerRevNum]_
>>> 
>>> +- "I like those results sorted by time and the chronology should be
>>> obvious, but
>>> 
>>> +  timestamps are incredibly cumbersome and make it difficult to verify that
>>> a
>>> 
>>> +  given checkout matches a given set of results." [TrickRevNum]_
>>> 
>>> +- "There is still the major regression with unreadable version numbers.
>>> 
>>> +  Given the amount of Bugzilla traffic with 'Fixed in...', that's a
>>> 
>>> +  non-trivial issue." [JSonnRevNum]_
>>> 
>>> +- "Sequential IDs are important for LNT and llvmlab bisection tool."
>>> [MatthewsRevNum]_.
>>> 
>>> +
>>> 
>>> +However, Git can emulate this increasing revision number:
>>> 
>>> +`git rev-list  --count <commit-hash>`. This identifier is unique only
>>> within a
>>> 
>>> +single branch, but this means the tuple `(num, branch-name)` uniquely
>>> identifies
>>> 
>>> +a commit.
>>> 
>>> +
>>> 
>>> +We can thus use this revision number to ensure that e.g. `clang -v` reports
>>> a
>>> 
>>> +user-friendly revision number (e.g. `master-12345` or `4.0-5321`),
>>> addressing
>>> 
>>> +the objections raised above with respect to this aspect of Git.
>>> 
>>> +
>>> 
>>> +What About Branches and Merges?
>>> 
>>> +-------------------------------
>>> 
>>> +
>>> 
>>> +In contrast to SVN, Git makes branching easy. Git's commit history is
>>> represented
>>> 
>>> +as a DAG, a departure from SVN's linear history.
>>> 
>>> +
>>> 
>>> +However, we propose to mandate making merge commits illegal in our
>>> canonical Git
>>> 
>>> +repository.
>>> 
>>> +
>>> 
>>> +Unfortunately, GitHub does not support server side hooks to enforce such a
>>> 
>>> +policy.  We must rely on the community to avoid pushing merge commits.
>>> 
>>> +
>>> 
>>> +GitHub offers a feature called `Status Checks`: a branch protected by
>>> 
>>> +`status checks` requires commits to be whitelisted before the push can
>>> happen.
>>> 
>>> +We could supply a pre-push hook on the client side that would run and check
>>> the
>>> 
>>> +history, before whitelisting the commit being pushed [statuschecks]_.
>>> 
>>> +However this solution would be somewhat fragile (how do you update a script
>>> 
>>> +installed on every developer machine?) and prevents SVN access to the
>>> 
>>> +repository.
>>> 
>>> +
>>> 
>>> +What About Commit Emails?
>>> 
>>> +-------------------------
>>> 
>>> +
>>> 
>>> +We will need a new bot to send emails for each commit. This proposal leaves
>>> the
>>> 
>>> +email format unchanged besides the commit URL.
>>> 
>>> +
>>> 
>>> +Straw Man Migration Plan
>>> 
>>> +========================
>>> 
>>> +
>>> 
>>> +STEP #1 : Before The Move
>>> 
>>> +
>>> 
>>> +1. Update docs to mention the move, so people are aware of what is going
>>> on.
>>> 
>>> +2. Set up a read-only version of the GitHub project, mirroring our current
>>> SVN
>>> 
>>> +   repository.
>>> 
>>> +3. Add the required bots to implement the commit emails, as well as the
>>> 
>>> +   umbrella repository update (if the multirepo is selected) or the
>>> read-only
>>> 
>>> +   Git views for the sub-projects (if the monorepo is selected).
>>> 
>>> +
>>> 
>>> +STEP #2 : Git Move
>>> 
>>> +
>>> 
>>> +4. Update the buildbots to pick up updates and commits from the GitHub
>>> 
>>> +   repository. Not all bots have to migrate at this point, but it'll help
>>> 
>>> +   provide infrastructure testing.
>>> 
>>> +5. Update Phabricator to pick up commits from the GitHub repository.
>>> 
>>> +6. LNT and llvmlab have to be updated: they rely on unique monotonically
>>> 
>>> +   increasing integer across branch [MatthewsRevNum]_.
>>> 
>>> +7. Instruct downstream integrators to pick up commits from the GitHub
>>> 
>>> +   repository.
>>> 
>>> +8. Review and prepare an update for the LLVM documentation.
>>> 
>>> +
>>> 
>>> +Until this point nothing has changed for developers, it will just
>>> 
>>> +boil down to a lot of work for buildbot and other infrastructure
>>> 
>>> +owners.
>>> 
>>> +
>>> 
>>> +Once all dependencies are cleared, and all problems have been solved:
>>> 
>>> +
>>> 
>>> +STEP #3: Write Access Move
>>> 
>>> +
>>> 
>>> +9. Collect developers' GitHub account information, and add them to the
>>> project.
>>> 
>>> +10. Switch the SVN repository to read-only and allow pushes to the GitHub
>>> repository.
>>> 
>>> +11. Update the documentation.
>>> 
>>> +12. Mirror Git to SVN.
>>> 
>>> +
>>> 
>>> +STEP #4 : Post Move
>>> 
>>> +
>>> 
>>> +13. Archive the SVN repository.
>>> 
>>> +14. Update links on the LLVM website pointing to viewvc/klaus/phab etc. to
>>> 
>>> +    point to GitHub instead.
>>> 
>>> +
>>> 
>>> +One or Multiple Repositories?
>>> 
>>> +=============================
>>> 
>>> +
>>> 
>>> +There are two major proposals for how to structure our Git repository: The
>>> 
>>> 
>>> This is one proposal document.  I think these should be called "variants"
>>> 
>>> instead of "proposals".  If you specifically dislike "variant", another
>>> option
>>> 
>>> is "sub-proposal".
>>> 
>>> 
>>> My edits below use "variant".
>>> 
>>> 
>>> +"multirepo" and the "monorepo".
>>> 
>>> +
>>> 
>>> +Multirepo Proposal
>>> 
>>> +------------------
>>> 
>>> +
>>> 
>>> +This proposal consists into moving each SVN sub-projects into its own
>>> separate
>>> 
>>> +Git repository. It would mimic the existing official separate read-only Git
>>> 
>>> +repositories (e.g. http://llvm.org/git/compiler-rt.git), and make them the
>>> new
>>> 
>>> +canonical repositories for each sub-projects.
>>> 
>>> 
>>> Grammar suggestions: "This variant recommends moving each LLVM sub-project
>>> to a
>>> 
>>> separate Git repository.  This mimics the existing official read-only Git
>>> 
>>> repositories (e.g., http://llvm.org/git/compiler-rt.git), and creates new
>>> 
>>> canonical repositories for each sub-project."
>>> 
>>> 
>>> (or s/variant/sub-proposal/)
>>> 
>>> 
>>> (I used "LLVM" instead of "SVN" to match the text in the monorepo proposal".
>>> 
>>> I'm not sure which is better, but it should be consistent.")
>>> 
>>> 
>>> 
>>> +
>>> 
>>> +This will allow the individual subprojects to stay and live independently:
>>> a
>>> 
>>> 
>>> s/stay and live independently/remain distinct/
>>> 
>>> 
>>> +developer interested only by compiler-rt can checkout only this repository,
>>> 
>>> 
>>> s/by/in/
>>> 
>>> 
>>> +build it, and commit/push-back to the repository without checking-out any
>>> of the
>>> 
>>> +other subprojects.
>>> 
>>> 
>>> Clarity/concision suggestion:
>>> 
>>> 
>>> "... and commit/push-back to the repository without checking-out any of the
>>> 
>>> other subprojects."
>>> 
>>> =>
>>> 
>>> "... and work in isolation of the other subprojects."
>>> 
>>> 
>>> +
>>> 
>>> +A key need is to be able to check out multiple projects (i.e. lldb+llvm or
>>> 
>>> 
>>> IIRC, lldb relies on clang, so this should be "lldb+clang+llvm".
>>> 
>>> 
>>> +clang+llvm+libcxx for example) at a specific revision.
>>> 
>>> +
>>> 
>>> +A tuple of revisions (one entry per repository) is sufficient to describe
>>> the
>>> 
>>> +state across separated Git repositories/sub-projects.
>>> 
>>> 
>>> I suggest: "... (one entry per repository) accurately describes the state
>>> 
>>> across the sub-projects."
>>> 
>>> 
>>> If you don't take that: s/separated/the distinct/  (or, "the separate")
>>> 
>>> 
>>> +For example, a given version of clang would be
>>> 
>>> +*<LLVM-12345, clang-5432, libcxx-123, etc.>*.
>>> 
>>> +
>>> 
>>> 
>>> <-- Add a subtitle here: "Umbrella Repository"
>>> 
>>> 
>>> +To make this more convenient, a separate *umbrella* repository will be
>>> 
>>> +provided. This repository will be used for the sole purpose of
>>> understanding
>>> 
>>> +the sequence in which commits were pushed to the different repositories and
>>> to
>>> 
>>> +provide a single revision number.
>>> 
>>> +
>>> 
>>> +This umbrella repository will be read-only and continuously updated
>>> 
>>> +to record the above tuple. The proposed form to record this is to use Git
>>> 
>>> +[submodules]_, possibly along with a set of scripts to help check out a
>>> 
>>> +specific revision of the LLVM distribution.
>>> 
>>> +
>>> 
>>> +A regular LLVM developer does not need to interact with the umbrella
>>> repository
>>> 
>>> +-- the individual repositories can be checked out independently -- but you
>>> would
>>> 
>>> +need to use the umbrella repository to bisect multiple sub-projects at the
>>> same
>>> 
>>> +time, or to check-out old revisions of llvm plus another sub-project at a
>>> 
>>> +consistent version.
>>> 
>>> +
>>> 
>>> +This umbrella repository will be updated automatically by a bot (running on
>>> 
>>> +notice from a webhook on every push, and periodically) on a per commit
>>> basis: a
>>> 
>>> +single commit in the umbrella repository would match a single commit in a
>>> 
>>> +subproject.
>>> 
>>> +
>>> 
>>> 
>>> <-- Add a subtitle here: "Preview" (or "Multirepo Preview")
>>> 
>>> 
>>> +As a preview (disclaimer: this rough prototype, not polished and not
>>> 
>>> +representative of the final solution), you can look at the following:
>>> 
>>> +
>>> 
>>> +  * Repository: https://github.com/llvm-beanz/llvm-submodules
>>> 
>>> +  * Update bot: http://beanz-bot.com:8180/jenkins/job/submodule-update/
>>> 
>>> +
>>> 
>>> +Concerns
>>> 
>>> +^^^^^^^^
>>> 
>>> +
>>> 
>>> + * Because GitHub does not allow server-side hooks, and because there is no
>>> 
>>> +   "push timestamp" in Git, the umbrella reposioty sequence isn't totally
>>> exact:
>>> 
>>> 
>>> s/reposioty/repository/
>>> 
>>> 
>>> +   commits from different repositories pushed around the same time can
>>> appear
>>> 
>>> +   in different orders. However, we don't expect it to be the common case
>>> or to
>>> 
>>> +   cause serious issues in practice.
>>> 
>>> 
>>> I'd like to add a bullet here that back-references the previous one: "*
>>> Another
>>> 
>>> option is to group commits that were pushed closely enough together in the
>>> 
>>> umbrella repository that the bot sees the commits at the same time.
>>> However,
>>> 
>>> this has the potential to group too many commits together, especially if the
>>> 
>>> bot goes down and needs to catch up."
>>> 
>>> 
>>> + * You can't have a single cross-projects commit that would update both
>>> LLVM and
>>> 
>>> +   other subprojects (something that can be achieved now). It would be
>>> possible
>>> 
>>> +   to establish a protocol whereby users add a special token to their
>>> commit
>>> 
>>> +   messages that causes the umbrella repo's updater bot to group all of
>>> them
>>> 
>>> +   into a single revision.
>>> 
>>> + * This proposal relies on heavier tooling. But the current prototype shows
>>> that
>>> 
>>> +   it is not out-of-reach.
>>> 
>>> + * Submodules don't have a good reputation / are complicating the command
>>> line.
>>> 
>>> +   However, in the proposed setup, a regular developer will seldom interact
>>> with
>>> 
>>> +   submodules directly, and certainly never update them.
>>> 
>>> +
>>> 
>>> 
>>> <--- Add a bulleted list of workflows here (as links):
>>> 
>>> 
>>> Workflows
>>> 
>>> ^^^^^^^^^
>>> 
>>> * __Link to multirepo workflow 1__.
>>> 
>>> * __Link to multirepo workflow 2__.
>>> 
>>> * ...
>>> 
>>> 
>>> 
>>> +Monorepo Proposal
>>> 
>>> +-----------------
>>> 
>>> 
>>> "Variant" or "sub-proposal", as above.
>>> 
>>> 
>>> +
>>> 
>>> +This proposal consists into moving all the LLVM sub-projects into a single
>>> Git
>>> 
>>> +repository. It will mimic an export of the current SVN repository: it would
>>> 
>>> +look similar to https://github.com/llvm-project/llvm-project, where each
>>> 
>>> +sub-project has its own top-level directory.
>>> 
>>> 
>>> Grammar suggestions: "This variant recommends moving all LLVM sub-projects
>>> to a
>>> 
>>> single Git repository, similar to
>>> https://github.com/llvm-project/llvm-project.
>>> 
>>> This would mimic an export of the current SVN repository, with each
>>> sub-project
>>> 
>>> having its own top-level directory."
>>> 
>>> 
>>> (or s/variant/sub-proposal/)
>>> 
>>> 
>>> +
>>> 
>>> 
>>> I think the following paragraph should be moved down a little and given a
>>> 
>>> sub-heading.  I suggest "Read/write sub-project mirrors".
>>> 
>>> 
>>> +With the Monorepo, the existing single-subproject mirrors (e.g.
>>> 
>>> +http://llvm.org/git/compiler-rt.git) with git-svn read-write access would
>>> 
>>> +continue to be maintained: developers would continue to be able to use the
>>> 
>>> 
>>> s/would/could/
>>> 
>>> 
>>> +existing single-subproject git repositories as they do today, with *no
>>> changes
>>> 
>>> +to workflow*. Everything (git fetch, git svn dcommit, etc.) would continue
>>> to
>>> 
>>> +work identically to how it works today.
>>> 
>>> 
>>> I disagree with "identically", since the revision numbers will likely change
>>> 
>>> (since the GitHub SVN export is unlikely to number revisions the same way
>>> that
>>> 
>>> our current SVN repository does).  I think you should weaken this slightly:
>>> 
>>> "Git fetch, git svn dcommit, etc., would continue to work the same way they
>>> do
>>> 
>>> today."
>>> 
>>> 
>>> +
>>> 
>>> 
>>> With the other paragraph moved down (with a subtitle so it isn't missed),
>>> the
>>> 
>>> following couple can be argued more directly.  I feel like this is the meat
>>> of
>>> 
>>> this variant (this is what's especially compelling about monorepo), so it's
>>> 
>>> important to have up-front.
>>> 
>>> 
>>> "Putting all sub-projects in a single checkout makes cross-project
>>> refactoring
>>> 
>>> naturally simple.  Moving code between sub-projects will always be done in a
>>> 
>>> single commit, designing away a common source of temporary build breakage.
>>> 
>>> Tooling based on `git grep` and `git blame` can be used natively across
>>> 
>>> sub-projects.  New sub-projects can be trivially split out for better reuse
>>> 
>>> and/or layering (e.g., to allow libSupport and/or LIT to be used by runtimes
>>> 
>>> without adding a dependency on LLVM)."
>>> 
>>> 
>>> +There are other consequences: having all the code in a single checkout
>>> 
>>> +imply that "git grep" works across sub-projects for instance. This can be
>>> used
>>> 
>>> +for example to find refactoring opportunities across projects (for example
>>> 
>>> +reusing a datastructure initially in LLDB by moving it into libSupport, or
>>> to
>>> 
>>> +decide to extract some pieces of libSupport and/or ADT to a new top-level
>>> 
>>> +*independent* library that can be reused in libcxxabi).
>>> 
>>> +Finally, having all the sources present encourages maintaining the other
>>> 
>>> +sub-projects when changing API.
>>> 
>>> +
>>> 
>>> +As another example, some developers think that the division between e.g.
>>> clang
>>> 
>>> +and clang-tools-extra is not useful. With the monorepo, we can move code
>>> around
>>> 
>>> +as we wish and preserve history.
>>> 
>>> 
>>> ^ I think I incorporated this part of the paragraph into the above.
>>> 
>>> 
>>> +With the multirepo, refactoring some functions from clang to make it part
>>> of a
>>> 
>>> +utility in libSupport to share it across sub-projects wouldn't carry the
>>> history
>>> 
>>> +of the code in the llvm repo, preventing recursively applying `git blame`
>>> for
>>> 
>>> +instance. The monorepo offers natively this ability.
>>> 
>>> 
>>> ^ This should already be handled above as well.  I removed the direct
>>> reference
>>> 
>>> to multirepo, but I think it's quite clear.
>>> 
>>> 
>>> +
>>> 
>>> +Finally, the monorepo maintains the property of the existing SVN repository
>>> that
>>> 
>>> +the sub-projects move synchronously, and a single revision number (or
>>> commit
>>> 
>>> +hash) identifies the state of the development across all projects.
>>> 
>>> 
>>> <-- This is where I want to move "Read/write sub-project mirrors"
>>> 
>>> 
>>> +
>>> 
>>> 
>>> <-- Add a subtitle: "Preview" (or "Monorepo Preview")
>>> 
>>> 
>>> +As a preview (disclaimer: this rough prototype, not polished and not
>>> 
>>> +representative of the final solution), you can look at the following:
>>> 
>>> +
>>> 
>>> +  * Full Repository: https://github.com/joker-eph/llvm-project
>>> 
>>> +  * Single subproject view with *SVN write access* to the full repo:
>>> 
>>> +    https://github.com/joker-eph/compiler-rt
>>> 
>>> +
>>> 
>>> +Concerns
>>> 
>>> +^^^^^^^^
>>> 
>>> +
>>> 
>>> + * Some concerns have been raised that having a single repository would be
>>> a
>>> 
>>> +   burden for those that have interest in only a single repository. This is
>>> 
>>> +   addressed by keeping the single-subproject Git mirrors for each project
>>> just
>>> 
>>> +   as we do today. For contributors the GitHub SVN bridge allows to
>>> contribute
>>> 
>>> +   to a single sub-project the same way it is possible today (see below
>>> 
>>> +   before/after section for more details).
>>> 
>>> 
>>> "* Using the monolithic repository may be burden for those working on a
>>> 
>>> standalone sub-project, particularly runtimes like libcxx and compiler-rt
>>> that
>>> 
>>> don't rely on LLVM.  A fresh clone of libcxx is only 15MB (vs. 1GB for the
>>> 
>>> monorepo, and the commit rate of LLVM may cause more frequent `git push`
>>> 
>>> collisions.  Affected users can continue to use the read/write SVN mirrors
>>> 
>>> and git-svn."
>>> 
>>> 
>>> ^ This should link to the subtitle "Read/write sub-project mirrors".
>>> 
>>> 
>>> The concern about maintain SVN should immediately follow to link the two
>>> 
>>> bullets.  I don't think "SVN disappearing" is specifically worth mentioning
>>> --
>>> 
>>> since we're not saying that we're worried about "GitHub disappearing" either
>>> --
>>> 
>>> so I've reworded this as a maintenance concern:
>>> 
>>> 
>>> "* Preservation of the existing read/write SVN-based workflows relies on the
>>> 
>>> GitHub SVN bridge, which is an extra dependency.  Maintaining this locks us
>>> 
>>> into GitHub and could restrict future workflow changes."
>>> 
>>> 
>>> I also think we should add:
>>> 
>>> 
>>> "* Not all sub-projects are used for building toolchains.  In practise, www/
>>> 
>>> and test-suite/ will probably stay out of the monorepo."
>>> 
>>> 
>>> + * Because I check out the full repository, will I build all the projects
>>> by
>>> 
>>> +   default? Nobody will be forced to compile projects they don't want to
>>> build.
>>> 
>>> +   The exact structure is TBD, but even if you use the monorepo directly,
>>> we'll
>>> 
>>> +   ensure that it's easy to set up your build to compile only a few
>>> particular
>>> 
>>> +   sub-projects.
>>> 
>>> 
>>> ^ I think there's full consensus that this is solvable.  I think you should
>>> 
>>> move it above the new read/write section and give it a subtitle:
>>> 
>>> 
>>>  Building a single sub-project
>>> 
>>>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>> 
>>> 
>>>  Nobody will be forced to build unnecessary projects.  The exact structure
>>> 
>>>  is TBD, but making it trivial to configure builds for a single
>>> sub-project
>>> 
>>>  (or a subset of sub-projects) is a hard requirement.
>>> 
>>> 
>>> I want this *before* read/write SVN access to make it structurally clear
>>> that
>>> 
>>> "building a single sub-project" does not depend on "using SVN/git-svn".
>>> 
>>> 
>>> + * The solution to ensure "zero-regression" and preserve existing workflow,
>>> 
>>> +   especially for developer that want to interact with a single sub-project
>>> and
>>> 
>>> +   limit the size of the clonse, relies on the SVN bridge offered by
>>> GitHub.
>>> 
>>> +   We can't guarantee that GitHub will support this bridge forever, even if
>>> they
>>> 
>>> +   have announced any intention to discontinue support for it.
>>> 
>>> 
>>> ^ This one can be cut; I replaced it with more concise wording above that
>>> 
>>> focuses on the maintenance burden.
>>> 
>>> 
>>> +
>>> 
>>> 
>>> <--- Add a bulleted list of workflows here (as links):
>>> 
>>> 
>>> Workflows
>>> 
>>> ^^^^^^^^^
>>> 
>>> * __Link to monorepo workflow 1__.
>>> 
>>> * __Link to monorepo workflow 2__.
>>> 
>>> * ...
>>> 
>>> 
>>> 
>>> +Multi/Mono Hybrid Variant
>>> 
>>> +-------------------------
>>> 
>>> 
>>> You use "variant" here, and I think it's the best word to use.  But for
>>> 
>>> consistency, please change to "sub-proposal" if that's what you go with for
>>> 
>>> multirepo and monorepo (or reconsider that choice...).
>>> 
>>> 
>>> +
>>> 
>>> +A variant of the monorepo proposal is to group together in a single
>>> repository
>>> 
>>> +only the projects that are *rev-locked* to LLVM (clang, lld, lldb, ...) and
>>> 
>>> +leave projects like libcxx and compiler-rt in their own individual and
>>> separate
>>> 
>>> +repositories.
>>> 
>>> +
>>> 
>>> 
>>> I'd reword like this:
>>> 
>>> 
>>> This variant recommends moving only the LLVM sub-projects that are
>>> 
>>> *rev-locked* to LLVM into a monorepo (clang, lld, lldb, ...), following
>>> the
>>> 
>>> multirepo proposal for the rest.  While neither variant recommends
>>> 
>>> combining sub-projects like www/ and test-suite/ (which are completely
>>> 
>>> standalone), this goes further and keeps sub-projects like libcxx and
>>> 
>>> compiler-rt in their own distinct repositories.
>>> 
>>> 
>>> (I added wording to clarify around test-suite/ and www/.)
>>> 
>>> 
>>> +Concerns
>>> 
>>> +^^^^^^^^
>>> 
>>> +
>>> 
>>> 
>>> Add:
>>> 
>>> 
>>> * All projects that use LIT for testing are effectively rev-locked to
>>> LLVM.
>>> 
>>>   Furthermore, some runtimes (like compiler-rt) are rev-locked with Clang.
>>> 
>>>   It's not clear where to draw the lines.
>>> 
>>> 
>>> + * Inconvenient of both proposals together (see above concerns).
>>> 
>>> 
>>> ^ Reword: "* This has most disadvantages of multirepo and monorepo, without
>>> 
>>> bringing many of the advantages."
>>> 
>>> 
>>> + * Downstream have to upgrade to the monorepo structure, but only
>>> partially. So
>>> 
>>> +   they will keep the infrastructure to integrate the other separate
>>> 
>>> +   sub-projects.
>>> 
>>> +
>>> 
>>> +Workflow Before/After
>>> 
>>> +=====================
>>> 
>>> +
>>> 
>>> +This section goes through a few examples of workflows, intended to
>>> illustrate
>>> 
>>> +how end-users or developers would interact with the repository for
>>> 
>>> +various use-cases.
>>> 
>>> +
>>> 
>>> +Checkout/Clone a Single Project, without Commit Access
>>> 
>>> +------------------------------------------------------
>>> 
>>> +
>>> 
>>> +Except the URL, nothing changes. The possibilities today are::
>>> 
>>> +
>>> 
>>> +  svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
>>> 
>>> +  # or with Git
>>> 
>>> +  git clone http://llvm.org/git/llvm.git
>>> 
>>> +
>>> 
>>> +After the move to GitHub, you would do either::
>>> 
>>> +
>>> 
>>> +  git clone https://github.com/llvm-project/llvm.git
>>> 
>>> +  # or using the GitHub svn native bridge
>>> 
>>> +  svn co https://github.com/llvm-project/llvm/trunk
>>> 
>>> +
>>> 
>>> +The above works for both the monorepo and the multirepo, as we'll maintain
>>> the
>>> 
>>> +existing read-only views of the individual sub-projects.
>>> 
>>> +
>>> 
>>> +Checkout/Clone a Single Project, with Commit Access
>>> 
>>> +---------------------------------------------------
>>> 
>>> +
>>> 
>>> +**Currently**
>>> 
>>> +::
>>> 
>>> +
>>> 
>>> +  # direct SVN checkout
>>> 
>>> +  svn co https://user@llvm.org/svn/llvm-project/llvm/trunk llvm
>>> 
>>> +  # or using the read-only Git view, with git-svn
>>> 
>>> +  git clone http://llvm.org/git/llvm.git
>>> 
>>> +  cd llvm
>>> 
>>> +  git svn init https://llvm.org/svn/llvm-project/llvm/trunk
>>> --username=<username>
>>> 
>>> +  git config svn-remote.svn.fetch :refs/remotes/origin/master
>>> 
>>> +  git svn rebase -l  # -l avoids fetching ahead of the git mirror.
>>> 
>>> +
>>> 
>>> +Commits are performed using `svn commit` or with the sequence `git commit`
>>> and
>>> 
>>> +`git svn dcommit`.
>>> 
>>> +
>>> 
>>> +**Multirepo Proposal**
>>> 
>>> +
>>> 
>>> +With the multirepo proposal, nothing changes but the URL, and commits can
>>> be
>>> 
>>> +performed using `svn commit` or `git commit` and `git push`::
>>> 
>>> +
>>> 
>>> +  git clone https://github.com/llvm/llvm.git llvm
>>> 
>>> +  # or using the GitHub svn native bridge
>>> 
>>> +  svn co https://github.com/llvm/llvm/trunk/ llvm
>>> 
>>> +
>>> 
>>> +**Monorepo Proposal**
>>> 
>>> +
>>> 
>>> +With the monorepo, there are multiple possibilities to achieve this.
>>> First,
>>> 
>>> +you could just clone the full repository::
>>> 
>>> +
>>> 
>>> +  git clone https://github.com/llvm/llvm-projects.git llvm
>>> 
>>> +  # or using the GitHub svn native bridge
>>> 
>>> +  svn co https://github.com/llvm/llvm-projects/trunk/ llvm
>>> 
>>> +
>>> 
>>> +At this point you have every sub-project (llvm, clang, lld, lldb, ...),
>>> which
>>> 
>>> +**doesn't imply you have to build all of them**. You can still build only
>>> 
>>> +compiler-rt for instance. In this way it's not different from someone who
>>> would
>>> 
>>> +check out all the projects with SVN today.
>>> 
>>> +
>>> 
>>> +You can commit as normal using `git commit` and `git push` or `svn commit`,
>>> and
>>> 
>>> +read the history for a single project (`git log libcxx` for example).
>>> 
>>> +
>>> 
>>> +There are a few options to avoid checking out all the sources.
>>> 
>>> +
>>> 
>>> +First, you could hide the other directories using a Git sparse checkout::
>>> 
>>> +
>>> 
>>> +  git config core.sparseCheckout true
>>> 
>>> +  echo /compiler-rt > .git/info/sparse-checkout
>>> 
>>> +  git read-tree -mu HEAD
>>> 
>>> +
>>> 
>>> +The data for all sub-projects is still in your `.git` directory, but in
>>> your
>>> 
>>> +checkout, you only see `compiler-rt`.
>>> 
>>> +Before you push, you'll need to fetch and rebase (`git pull --rebase`) as
>>> 
>>> +usual.
>>> 
>>> +
>>> 
>>> +Note that when you fetch you'll likely pull in changes to sub-projects you
>>> don't
>>> 
>>> +care about. If you are using spasre checkout, the files from other projects
>>> 
>>> +won't appear on your disk. The only effect is that your commit hash
>>> changes.
>>> 
>>> +
>>> 
>>> +You can check whether the changes in the last fetch are relevant to your
>>> commit
>>> 
>>> +by running::
>>> 
>>> +
>>> 
>>> +  git log origin/master@{1}..origin/master -- libcxx
>>> 
>>> +
>>> 
>>> +This command can be hidden in a script so that `git llvmpush` would perform
>>> all
>>> 
>>> +these steps, fail only if such a dependent change exists, and show
>>> immediately
>>> 
>>> +the change that prevented the push. An immediate repeat of the command
>>> would
>>> 
>>> +(almost) certainly result in a successful push.
>>> 
>>> +Note that today with SVN or git-svn, this step is not possible since the
>>> 
>>> +"rebase" implicitly happens while committing (unless a conflict occurs).
>>> 
>>> +
>>> 
>>> +A second option is to use svn via the GitHub svn native bridge::
>>> 
>>> +
>>> 
>>> +  svn co https://github.com/llvm/llvm-projects/trunk/compiler-rt
>>> compiler-rt  —username=...
>>> 
>>> +
>>> 
>>> +This checks out only compiler-rt and provides commit access using "svn
>>> commit",
>>> 
>>> +in the same way as it would do today.
>>> 
>>> +
>>> 
>>> +Finally, you could use *git-svn* and one of the sub-project mirrors::
>>> 
>>> +
>>> 
>>> +  # Clone from the single read-only Git repo
>>> 
>>> +  git clone http://llvm.org/git/llvm.git
>>> 
>>> +  cd llvm
>>> 
>>> +  # Configure the SVN remote and initialize the svn metadata
>>> 
>>> +  $ git svn init https://github.com/joker-eph/llvm-project/trunk/llvm
>>> —username=...
>>> 
>>> +  git config svn-remote.svn.fetch :refs/remotes/origin/master
>>> 
>>> +  git svn rebase -l
>>> 
>>> +
>>> 
>>> +In this case the repository contains only a single sub-project, and commits
>>> can
>>> 
>>> +be made using `git svn dcommit`, again exactly as we do today.
>>> 
>>> +
>>> 
>>> +Checkout/Clone Multiple Projects, with Commit Access
>>> 
>>> +----------------------------------------------------
>>> 
>>> +
>>> 
>>> +Let's look how to assemble llvm+clang+libcxx at a given revision.
>>> 
>>> +
>>> 
>>> +**Currently**
>>> 
>>> +::
>>> 
>>> +
>>> 
>>> +  svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm -r $REVISION
>>> 
>>> +  cd llvm/tools
>>> 
>>> +  svn co http://llvm.org/svn/llvm-project/clang/trunk clang -r $REVISION
>>> 
>>> +  cd ../projects
>>> 
>>> +  svn co http://llvm.org/svn/llvm-project/libcxx/trunk libcxx -r $REVISION
>>> 
>>> +
>>> 
>>> +Or using git-svn::
>>> 
>>> +
>>> 
>>> +  git clone http://llvm.org/git/llvm.git
>>> 
>>> +  cd llvm/
>>> 
>>> +  git svn init https://llvm.org/svn/llvm-project/llvm/trunk
>>> --username=<username>
>>> 
>>> +  git config svn-remote.svn.fetch :refs/remotes/origin/master
>>> 
>>> +  git svn rebase -l
>>> 
>>> +  git checkout `git svn find-rev -B r258109`
>>> 
>>> +  cd tools
>>> 
>>> +  git clone http://llvm.org/git/clang.git
>>> 
>>> +  cd clang/
>>> 
>>> +  git svn init https://llvm.org/svn/llvm-project/clang/trunk
>>> --username=<username>
>>> 
>>> +  git config svn-remote.svn.fetch :refs/remotes/origin/master
>>> 
>>> +  git svn rebase -l
>>> 
>>> +  git checkout `git svn find-rev -B r258109`
>>> 
>>> +  cd ../../projects/
>>> 
>>> +  git clone http://llvm.org/git/libcxx.git
>>> 
>>> +  cd libcxx
>>> 
>>> +  git svn init https://llvm.org/svn/llvm-project/libcxx/trunk
>>> --username=<username>
>>> 
>>> +  git config svn-remote.svn.fetch :refs/remotes/origin/master
>>> 
>>> +  git svn rebase -l
>>> 
>>> +  git checkout `git svn find-rev -B r258109`
>>> 
>>> +
>>> 
>>> +Note that the list would be longer with more sub-projects.
>>> 
>>> +
>>> 
>>> +**Multirepo Proposal**
>>> 
>>> +
>>> 
>>> +With the multirepo proposal, the umbrella repository will be used. This is
>>> 
>>> +where the mapping from a single revision number to the individual
>>> repositories
>>> 
>>> +revisions is stored.::
>>> 
>>> +
>>> 
>>> +  git clone https://github.com/llvm-beanz/llvm-submodules
>>> 
>>> +  cd llvm-submodules
>>> 
>>> +  git checkout $REVISION
>>> 
>>> +  git submodule init
>>> 
>>> +  git submodule update clang llvm libcxx
>>> 
>>> +  # the list of subproject is optional, `git submodule update` would get
>>> them all.
>>> 
>>> +
>>> 
>>> +At this point the clang, llvm, and libcxx individual repositories are
>>> cloned
>>> 
>>> +and stored alongside each other. There are CMake flags to describe the
>>> directory
>>> 
>>> +structure; alternatively, you can just symlink `clang` to
>>> `llvm/tools/clang`,
>>> 
>>> +etc.
>>> 
>>> +
>>> 
>>> +Another option is to checkout repositories based on the commit timestamp::
>>> 
>>> +
>>> 
>>> +  git checkout `git rev-list -n 1 --before="2009-07-27 13:37" master`
>>> 
>>> +
>>> 
>>> +**Monorepo Proposal**
>>> 
>>> +
>>> 
>>> +The repository contains natively the source for every sub-projects at the
>>> right
>>> 
>>> +revision, which makes this straightforward::
>>> 
>>> +
>>> 
>>> +  git clone https://github.com/llvm/llvm-projects.git llvm-projects
>>> 
>>> +  cd llvm-projects
>>> 
>>> +  git checkout $REVISION
>>> 
>>> +
>>> 
>>> +As before, at this point clang, llvm, and libcxx are stored in directories
>>> 
>>> +alongside each other.
>>> 
>>> +
>>> 
>>> +Commit an API Change in LLVM and Update the Sub-projects
>>> 
>>> +--------------------------------------------------------
>>> 
>>> +
>>> 
>>> +Today this is possible, even though not common (at least not documented)
>>> for
>>> 
>>> +subversion users and for git-svn users. Few Git users try to e.g. update
>>> LLD or
>>> 
>>> +Clang in the same commit as they change an LLVM API.
>>> 
>>> +
>>> 
>>> +The multirepo proposal does not address this: one would have to commit and
>>> push
>>> 
>>> +separately in every individual repository. It would be possible to
>>> establish a
>>> 
>>> +protocol whereby users add a special token to their commit messages that
>>> causes
>>> 
>>> +the umbrella repo's updater bot to group all of them into a single
>>> revision.
>>> 
>>> +
>>> 
>>> +The monorepo proposal handles this natively.
>>> 
>>> +
>>> 
>>> +Branching/Stashing/Updating for Local Development or Experiments
>>> 
>>> +----------------------------------------------------------------
>>> 
>>> +
>>> 
>>> +**Currently**
>>> 
>>> +
>>> 
>>> +SVN does not allow this use case, but developers that are currently using
>>> 
>>> +git-svn can do it. Let's look in practice what it means when dealing with
>>> 
>>> +multiple sub-projects.
>>> 
>>> +
>>> 
>>> +To update the repository to tip of trunk::
>>> 
>>> +
>>> 
>>> +  git pull
>>> 
>>> +  cd tools/clang
>>> 
>>> +  git pull
>>> 
>>> +  cd ../../projects/libcxx
>>> 
>>> +  git pull
>>> 
>>> +
>>> 
>>> +To create a new branch::
>>> 
>>> +
>>> 
>>> +  git checkout -b MyBranch
>>> 
>>> +  cd tools/clang
>>> 
>>> +  git checkout -b MyBranch
>>> 
>>> +  cd ../../projects/libcxx
>>> 
>>> +  git checkout -b MyBranch
>>> 
>>> +
>>> 
>>> +To switch branches::
>>> 
>>> +
>>> 
>>> +  git checkout AnotherBranch
>>> 
>>> +  cd tools/clang
>>> 
>>> +  git checkout AnotherBranch
>>> 
>>> +  cd ../../projects/libcxx
>>> 
>>> +  git checkout AnotherBranch
>>> 
>>> +
>>> 
>>> +**Multirepo Proposal**
>>> 
>>> +
>>> 
>>> +The multirepo works the same as the current Git workflow: every command
>>> needs
>>> 
>>> +to be applied to each of the individual repositories.
>>> 
>>> +However, the umbrella repository makes this easy using `git submodule
>>> foreach`
>>> 
>>> +to replicate a command on all the individual repositories (or submodules
>>> 
>>> +in this case):
>>> 
>>> +
>>> 
>>> +To create a new branch::
>>> 
>>> +
>>> 
>>> +  git submodule foreach git checkout -b MyBranch
>>> 
>>> +
>>> 
>>> +To switch branches::
>>> 
>>> +
>>> 
>>> +  git submodule foreach git checkout AnotherBranch
>>> 
>>> +
>>> 
>>> +**Monorepo Proposal**
>>> 
>>> +
>>> 
>>> +Regular Git commands are sufficient, because everything is in a single
>>> 
>>> +repository:
>>> 
>>> +
>>> 
>>> +To update the repository to tip of trunk::
>>> 
>>> +
>>> 
>>> +  git pull
>>> 
>>> +
>>> 
>>> +To create a new branch::
>>> 
>>> +
>>> 
>>> +  git checkout -b MyBranch
>>> 
>>> +
>>> 
>>> +To switch branches::
>>> 
>>> +
>>> 
>>> +  git checkout AnotherBranch
>>> 
>>> +
>>> 
>>> +Bisecting
>>> 
>>> +---------
>>> 
>>> +
>>> 
>>> +Assuming a developer is looking for a bug in clang (or lld, or lldb, ...).
>>> 
>>> +
>>> 
>>> +**Currently**
>>> 
>>> +
>>> 
>>> +SVN does not have builtin bisection support, but the single revision across
>>> 
>>> +sub-projects makes it possible to script around.
>>> 
>>> +
>>> 
>>> +Using the existing Git read-only view of the repositories, it is possible
>>> to use
>>> 
>>> +the native Git bisection script over the llvm repository, and use some
>>> scripting
>>> 
>>> +to synchronize the clang repository to match the llvm revision.
>>> 
>>> +
>>> 
>>> +**Multirepo Proposal**
>>> 
>>> +
>>> 
>>> +With the multi-repositories proposal, the cross-repository synchronization
>>> is
>>> 
>>> +achieved using the umbrella repository. This repository contains only
>>> 
>>> +submodules for the other sub-projects. The native Git bisection can be used
>>> on
>>> 
>>> +the umbrella repository directly. A subtlety is that the bisect script
>>> itself
>>> 
>>> +needs to make sure the submodules are updated accordingly.
>>> 
>>> +
>>> 
>>> +For example, to find which commit introduces a regression where clang-3.9
>>> 
>>> +crashes but not clang-3.8 passes, one should be able to simply do::
>>> 
>>> +
>>> 
>>> +  git bisect start release_39 release_38
>>> 
>>> +  git bisect run ./bisect_script.sh
>>> 
>>> +
>>> 
>>> +With the `bisect_script.sh` script being::
>>> 
>>> +
>>> 
>>> +  #!/bin/sh
>>> 
>>> +  cd $UMBRELLA_DIRECTORY
>>> 
>>> +  git submodule update llvm clang libcxx #....
>>> 
>>> +  cd $BUILD_DIR
>>> 
>>> +
>>> 
>>> +  ninja clang || exit 125   # an exit code of 125 asks "git bisect"
>>> 
>>> +                            # to "skip" the current commit
>>> 
>>> +
>>> 
>>> +  ./bin/clang some_crash_test.cpp
>>> 
>>> +
>>> 
>>> +When the `git bisect run` command returns, the umbrella repository is set
>>> to
>>> 
>>> +the state where the regression is introduced. The commit diff in the
>>> umbrella
>>> 
>>> +indicate which submodule was updated, and the last commit in this
>>> subprojects is
>>> 
>>> +the one that the bisect found.
>>> 
>>> +
>>> 
>>> +**Monorepo Proposal**
>>> 
>>> +
>>> 
>>> +Bisecting on the monorepo is straightforward, and very similar to the
>>> above,
>>> 
>>> +expect that the bisection script does not need to include the
>>> 
>>> 
>>> s/expect/except/
>>> 
>>> 
>>> +`git submodule update` step.
>>> 
>>> +
>>> 
>>> +The same example, finding which commit introduces a regression where
>>> clang-3.9
>>> 
>>> +crashes but not clang-3.8 passes, will look like::
>>> 
>>> +
>>> 
>>> +  git bisect start release_39 release_38
>>> 
>>> +  git bisect run ./bisect_script.sh
>>> 
>>> +
>>> 
>>> +With the `bisect_script.sh` script being::
>>> 
>>> +
>>> 
>>> +  #!/bin/sh
>>> 
>>> +  cd $BUILD_DIR
>>> 
>>> +
>>> 
>>> +  ninja clang || exit 125   # an exit code of 125 asks "git bisect"
>>> 
>>> +                            # to "skip" the current commit
>>> 
>>> +
>>> 
>>> +  ./bin/clang some_crash_test.cpp
>>> 
>>> 
>>> Since this is almost duplicated (and an implementation detail), it would be
>>> 
>>> nice to separate to an appendix.  If you don't think there's a clear way to
>>> 
>>> do that I'm fine as is.
>>> 
>>> 
>>> +
>>> 
>>> +Also, since the monorepo handles commits update across multiple projects,
>>> you're
>>> 
>>> +less like to encounter a build failure where a commit change an API in LLVM
>>> and
>>> 
>>> +another later one "fixes" the build in clang.
>>> 
>>> +
>>> 
>>> +Living Downstream
>>> 
>>> +-----------------
>>> 
>>> 
>>> I think this should be split up between multirepo and monorepo ("inlined"
>>> into
>>> 
>>> the variants, specifically before the "Preview" subtitle).
>>> 
>>> - As written, it's still hard to see what applies to multirepo and what to
>>> 
>>> monorepo.
>>> 
>>> - There's a fairly long monorepo-only section.
>>> 
>>> - I think there should be a matching multirepo-only section.
>>> 
>>> 
>>> Here's my recommendation:
>>> 
>>> 
>>> ...
>>> 
>>> 
>>> Multirepo Variant
>>> 
>>> -----------------
>>> 
>>> ...
>>> 
>>> 
>>> Living Downstream
>>> 
>>> ^^^^^^^^^^^^^^^^^
>>> 
>>> Downstream SVN users can use the read/write SVN bridges with the following
>>> 
>>> caveats:
>>> 
>>> * Be prepared for a one-time change to the upstream revision numbers.
>>> 
>>> * The upstream sub-project revision numbers will no longer be in sync.
>>> 
>>> 
>>> Downstream Git users can continue without any major changes, with the
>>> minor
>>> 
>>> change of upstreaming using `git push` instead of `git svn dcommit`.
>>> 
>>> 
>>> Git users also have the option of adopting an umbrella repository
>>> 
>>> downstream.  The tooling for the upstream umbrella can easily be reused
>>> for
>>> 
>>> downstream needs, incorporating extra sub-projects and branching in
>>> 
>>> parallel with sub-project branches.
>>> 
>>> 
>>> Preview
>>> 
>>> ^^^^^^^
>>> 
>>> ...
>>> 
>>> 
>>> Monorepo Variant
>>> 
>>> -----------------
>>> 
>>> ...
>>> 
>>> 
>>> Living Downstream
>>> 
>>> ^^^^^^^^^^^^^^^^^
>>> 
>>> Downstream SVN users can use the read/write SVN bridge with the following
>>> 
>>> caveat:
>>> 
>>> * Be prepared for a one-time change to the upstream revision numbers.
>>> 
>>> 
>>> Downstream Git users can continue without any major changes, by using the
>>> 
>>> git-svn mirrors on top of the SVN bridge.
>>> 
>>> 
>>> Git users can also work upstream with monorepo even if their downstream
>>> 
>>> fork has split repositories.  They can apply patches in the appropriate
>>> 
>>> subdirectories of the monorepo using, e.g., `git am --directory=...`, or
>>> 
>>> plain `diff` and `patch`.
>>> 
>>> 
>>> Alternatively, Git users can migrate their own fork to the monorepo.  As a
>>> 
>>> demonstration, we've migrated the "CHERI" fork to the monorepo in two
>>> ways:
>>> 
>>> 
>>> * Using a script that rewrites history (including merges) so that it looks
>>> 
>>>   like the fork always lived in the monorepo [LebarCHERI]_.  The upside of
>>> 
>>>   this is when you check out an old revision, you get a copy of all llvm
>>> 
>>>   sub-projects at a consistent revision.  (For instance, if it's a clang
>>> 
>>>   fork, when you check out an old revision you'll get a consistent version
>>> 
>>>   of llvm proper.)  The downside is that this changes the fork's commit
>>> 
>>>   hashes.
>>> 
>>> 
>>> * Merging the fork into the monorepo [AminiCHERI]_.  This preserves the
>>> 
>>>   fork's commit hashes, but when you check out an old commit you only get
>>> 
>>>   the one sub-project.
>>> 
>>> 
>>> Preview
>>> 
>>> ^^^^^^^
>>> 
>>> ...
>>> 
>>> 
>>> +
>>> 
>>> +Depending on which of the multirepo or the monorepo proposal gets accepted,
>>> 
>>> +and depending on the integration scheme, downstream projects may be
>>> differently
>>> 
>>> +impacted and have different options.
>>> 
>>> +
>>> 
>>> +* If you were pulling from the SVN repo before the switch to Git. The
>>> monorepo
>>> 
>>> +  will allow you to continue to use SVN the same way. The main caveat is
>>> that
>>> 
>>> +  you'll need to be prepared for a one-time change to the revision numbers.
>>> 
>>> +  The multirepo proposal still offers an SVN access to each individual
>>> 
>>> +  sub-project, but the SVN revision for each sub-project won't be
>>> synchronized.
>>> 
>>> +
>>> 
>>> +* If you were pulling from one of the existing read-only Git repos, this
>>> also
>>> 
>>> +  will continue to work as before as they will continue to exist in both of
>>> the
>>> 
>>> +  proposals.
>>> 
>>> +
>>> 
>>> +Under the monorepo proposal, you have a third option: migrating your fork
>>> to
>>> 
>>> +the monorepo. If your fork touches multiple LLVM projects, migrating your
>>> fork
>>> 
>>> +into the mono repo would enable you to make commits that touch multiple
>>> projects
>>> 
>>> +at the same time the same way LLVM contributors would be able to do so.
>>> 
>>> +
>>> 
>>> +As a demonstration, we've migrated the "CHERI" fork to the monorepo in two
>>> ways:
>>> 
>>> +
>>> 
>>> +* Using a script that rewrites history (including merges) so that it looks
>>> like
>>> 
>>> +  the fork always lived in the monorepo [LebarCHERI]_.  The upside of this
>>> is
>>> 
>>> +  when you check out an old revision, you get a copy of all llvm
>>> sub-projects at
>>> 
>>> +  a consistent revision.  (For instance, if it's a clang fork, when you
>>> check
>>> 
>>> +  out an old revision you'll get a consistent version of llvm proper.)  The
>>> 
>>> +  downside is that this changes the fork's commit hashes.
>>> 
>>> +
>>> 
>>> +* Merging the fork into the monorepo [AminiCHERI]_.  This preserves the
>>> fork's
>>> 
>>> +  commit hashes, but when you check out an old commit you only get the one
>>> 
>>> +  sub-project.
>>> 
>>> +
>>> 
>>> +If you keep a split-repository solution downstream, upstreaming patches to
>>> 
>>> +the monorepo is always possible (the splitrepo is obvious): you can apply
>>> the
>>> 
>>> +patches in the appropriate subdirectory of the monorepo (using either
>>> 
>>> +`git am --directory=...` or plain `diff` and `patch`).
>>> 
>>> +
>>> 
>>> +References
>>> 
>>> +==========
>>> 
>>> +
>>> 
>>> +.. [LattnerRevNum] Chris Lattner,
>>> http://lists.llvm.org/pipermail/llvm-dev/2011-July/041739.html
>>> 
>>> +.. [TrickRevNum] Andrew Trick,
>>> http://lists.llvm.org/pipermail/llvm-dev/2011-July/041721.html
>>> 
>>> +.. [JSonnRevNum] Joerg Sonnenberg,
>>> http://lists.llvm.org/pipermail/llvm-dev/2011-July/041688.html
>>> 
>>> +.. [TorvaldRevNum] Linus Torvald,
>>> http://git.661346.n2.nabble.com/Git-commit-generation-numbers-td6584414.html
>>> 
>>> +.. [MatthewsRevNum] Chris Matthews,
>>> http://lists.llvm.org/pipermail/cfe-dev/2016-July/049886.html
>>> 
>>> +.. [submodules] Git submodules,
>>> https://git-scm.com/book/en/v2/Git-Tools-Submodules)
>>> 
>>> +.. [statuschecks] GitHub status-checks,
>>> https://help.github.com/articles/about-required-status-checks/
>>> 
>>> +.. [LebarCHERI] Port *CHERI* to a single repository rewriting history,
>>> http://lists.llvm.org/pipermail/llvm-dev/2016-July/102787.html
>>> 
>>> +.. [AminiCHERI] Port *CHERI* to a single repository preserving history,
>>> http://lists.llvm.org/pipermail/llvm-dev/2016-July/102804.html
>>> 
>>> 
>>> 
>>> 
> 



More information about the llvm-commits mailing list