[llvm] r284077 - Moving to GitHub - Unified Proposal

Mehdi Amini via llvm-commits llvm-commits at lists.llvm.org
Wed Oct 12 16:02:02 PDT 2016


Author: mehdi_amini
Date: Wed Oct 12 18:02:02 2016
New Revision: 284077

URL: http://llvm.org/viewvc/llvm-project?rev=284077&view=rev
Log:
Moving to GitHub - Unified Proposal

This document describes the proposal to move to GitHub, and
compare the two proposals through various workflow examples,
presenting the current set of commands following by the ones
involved in each of the two proposals.

It is intended to supersede the previous "submodule proposal"
document entirely, and drive the discussion at the BoF during
the next Dev Meeting.

Differential Revision: https://reviews.llvm.org/D24167

Added:
    llvm/trunk/docs/Proposals/GitHubMove.rst
Removed:
    llvm/trunk/docs/Proposals/GitHubSubMod.rst
Modified:
    llvm/trunk/docs/index.rst

Added: llvm/trunk/docs/Proposals/GitHubMove.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/Proposals/GitHubMove.rst?rev=284077&view=auto
==============================================================================
--- llvm/trunk/docs/Proposals/GitHubMove.rst (added)
+++ llvm/trunk/docs/Proposals/GitHubMove.rst Wed Oct 12 18:02:02 2016
@@ -0,0 +1,868 @@
+==============================
+Moving LLVM Projects to GitHub
+==============================
+
+.. contents:: Table of Contents
+  :depth: 4
+  :local:
+
+Introduction
+============
+
+This is a proposal to move our current revision control system from our own
+hosted Subversion to GitHub. Below are the financial and technical arguments as
+to why we are proposing such a move and how people (and validation
+infrastructure) will continue to work with a Git-based LLVM.
+
+There will be a survey pointing at this document which we'll use to gauge the
+community's reaction and, if we collectively decide to move, the time-frame. Be
+sure to make your view count.
+
+Additionally, we will discuss this during a BoF at the next US LLVM Developer
+meeting (http://llvm.org/devmtg/2016-11/).
+
+What This Proposal is *Not* About
+=================================
+
+Changing the development policy.
+
+This proposal relates only to moving the hosting of our source-code repository
+from SVN hosted on our own servers to Git hosted on GitHub. We are not proposing
+using GitHub's issue tracker, pull-requests, or code-review.
+
+Contributers will continue to earn commit access on demand under the Developer
+Policy, except that that a GitHub account will be required instead of SVN
+username/password-hash.
+
+Why Git, and Why GitHub?
+========================
+
+Why Move At All?
+----------------
+
+This discussion began because we currently host our own Subversion server
+and Git mirror on a voluntary basis. The LLVM Foundation sponsors the server and
+provides limited support, but there is only so much it can do.
+
+Volunteers are not sysadmins themselves, but compiler engineers that happen
+to know a thing or two about hosting servers. We also don't have 24/7 support,
+and we sometimes wake up to see that continuous integration is broken because
+the SVN server is either down or unresponsive.
+
+We should take advantage of one of the services out there (GitHub, GitLab,
+and BitBucket, among others) that offer better service (24/7 stability, disk
+space, Git server, code browsing, forking facilities, etc) for free.
+
+Why Git?
+--------
+
+Many new coders nowadays start with Git, and a lot of people have never used
+SVN, CVS, or anything else. Websites like GitHub have changed the landscape
+of open source contributions, reducing the cost of first contribution and
+fostering collaboration.
+
+Git is also the version control many LLVM developers use. Despite the
+sources being stored in a SVN server, these developers are already using Git
+through the Git-SVN integration.
+
+Git allows you to:
+
+* Commit, squash, merge, and fork locally without touching the remote server.
+* Maintain local branches, enabling multiple threads of development.
+* Collaborate on these branches (e.g. through your own fork of llvm on GitHub).
+* Inspect the repository history (blame, log, bisect) without Internet access.
+* Maintain remote forks and branches on Git hosting services and
+  integrate back to the main repository.
+
+In addition, because Git seems to be replacing many OSS projects' version
+control systems, there are many tools that are built over Git.
+Future tooling may support Git first (if not only).
+
+Why GitHub?
+-----------
+
+GitHub, like GitLab and BitBucket, provides free code hosting for open source
+projects. Any of these could replace the code-hosting infrastructure that we
+have today.
+
+These services also have a dedicated team to monitor, migrate, improve and
+distribute the contents of the repositories depending on region and load.
+
+GitHub has one important advantage over GitLab and
+BitBucket: it offers read-write **SVN** access to the repository
+(https://github.com/blog/626-announcing-svn-support).
+This would enable people to continue working post-migration as though our code
+were still canonically in an SVN repository.
+
+In addition, there are already multiple LLVM mirrors on GitHub, indicating that
+part of our community has already settled there.
+
+On Managing Revision Numbers with Git
+-------------------------------------
+
+The current SVN repository hosts all the LLVM sub-projects alongside each other.
+A single revision number (e.g. r123456) thus identifies a consistent version of
+all LLVM sub-projects.
+
+Git does not use sequential integer revision number but instead uses a hash to
+identify each commit. (Linus mentioned that the lack of such revision number
+is "the only real design mistake" in Git [TorvaldRevNum]_.)
+
+The loss of a sequential integer revision number has been a sticking point in
+past discussions about Git:
+
+- "The 'branch' I most care about is mainline, and losing the ability to say
+  'fixed in r1234' (with some sort of monotonically increasing number) would
+  be a tragic loss." [LattnerRevNum]_
+- "I like those results sorted by time and the chronology should be obvious, but
+  timestamps are incredibly cumbersome and make it difficult to verify that a
+  given checkout matches a given set of results." [TrickRevNum]_
+- "There is still the major regression with unreadable version numbers.
+  Given the amount of Bugzilla traffic with 'Fixed in...', that's a
+  non-trivial issue." [JSonnRevNum]_
+- "Sequential IDs are important for LNT and llvmlab bisection tool." [MatthewsRevNum]_.
+
+However, Git can emulate this increasing revision number:
+`git rev-list --count <commit-hash>`. This identifier is unique only within a
+single branch, but this means the tuple `(num, branch-name)` uniquely identifies
+a commit.
+
+We can thus use this revision number to ensure that e.g. `clang -v` reports a
+user-friendly revision number (e.g. `master-12345` or `4.0-5321`), addressing
+the objections raised above with respect to this aspect of Git.
+
+What About Branches and Merges?
+-------------------------------
+
+In contrast to SVN, Git makes branching easy. Git's commit history is
+represented as a DAG, a departure from SVN's linear history. However, we propose
+to mandate making merge commits illegal in our canonical Git repository.
+
+Unfortunately, GitHub does not support server side hooks to enforce such a
+policy.  We must rely on the community to avoid pushing merge commits.
+
+GitHub offers a feature called `Status Checks`: a branch protected by
+`status checks` requires commits to be whitelisted before the push can happen.
+We could supply a pre-push hook on the client side that would run and check the
+history, before whitelisting the commit being pushed [statuschecks]_.
+However this solution would be somewhat fragile (how do you update a script
+installed on every developer machine?) and prevents SVN access to the
+repository.
+
+What About Commit Emails?
+-------------------------
+
+We will need a new bot to send emails for each commit. This proposal leaves the
+email format unchanged besides the commit URL.
+
+Straw Man Migration Plan
+========================
+
+Step #1 : Before The Move
+-------------------------
+
+1. Update docs to mention the move, so people are aware of what is going on.
+2. Set up a read-only version of the GitHub project, mirroring our current SVN
+   repository.
+3. Add the required bots to implement the commit emails, as well as the
+   umbrella repository update (if the multirepo is selected) or the read-only
+   Git views for the sub-projects (if the monorepo is selected).
+
+Step #2 : Git Move
+------------------
+
+4. Update the buildbots to pick up updates and commits from the GitHub
+   repository. Not all bots have to migrate at this point, but it'll help
+   provide infrastructure testing.
+5. Update Phabricator to pick up commits from the GitHub repository.
+6. LNT and llvmlab have to be updated: they rely on unique monotonically
+   increasing integer across branch [MatthewsRevNum]_.
+7. Instruct downstream integrators to pick up commits from the GitHub
+   repository.
+8. Review and prepare an update for the LLVM documentation.
+
+Until this point nothing has changed for developers, it will just
+boil down to a lot of work for buildbot and other infrastructure
+owners.
+
+The migration will pause here until all dependencies have cleared, and all
+problems have been solved.
+
+Step #3: Write Access Move
+--------------------------
+
+9. Collect developers' GitHub account information, and add them to the project.
+10. Switch the SVN repository to read-only and allow pushes to the GitHub repository.
+11. Update the documentation.
+12. Mirror Git to SVN.
+
+Step #4 : Post Move
+-------------------
+
+13. Archive the SVN repository.
+14. Update links on the LLVM website pointing to viewvc/klaus/phab etc. to
+    point to GitHub instead.
+
+One or Multiple Repositories?
+=============================
+
+There are two major variants for how to structure our Git repository: The
+"multirepo" and the "monorepo".
+
+Multirepo Variant
+-----------------
+
+This variant recommends moving each LLVM sub-project to a separate Git
+repository. This mimics the existing official read-only Git repositories
+(e.g., http://llvm.org/git/compiler-rt.git), and creates new canonical
+repositories for each sub-project.
+
+This will allow the individual sub-projects to remain distinct: a
+developer interested only in compiler-rt can checkout only this repository,
+build it, and work in isolation of the other sub-projects.
+
+A key need is to be able to check out multiple projects (i.e. lldb+clang+llvm or
+clang+llvm+libcxx for example) at a specific revision.
+
+A tuple of revisions (one entry per repository) accurately describes the state
+across the sub-projects.
+For example, a given version of clang would be
+*<LLVM-12345, clang-5432, libcxx-123, etc.>*.
+
+Umbrella Repository
+^^^^^^^^^^^^^^^^^^^
+
+To make this more convenient, a separate *umbrella* repository will be
+provided. This repository will be used for the sole purpose of understanding
+the sequence in which commits were pushed to the different repositories and to
+provide a single revision number.
+
+This umbrella repository will be read-only and continuously updated
+to record the above tuple. The proposed form to record this is to use Git
+[submodules]_, possibly along with a set of scripts to help check out a
+specific revision of the LLVM distribution.
+
+A regular LLVM developer does not need to interact with the umbrella repository
+-- the individual repositories can be checked out independently -- but you would
+need to use the umbrella repository to bisect multiple sub-projects at the same
+time, or to check-out old revisions of LLVM with another sub-project at a
+consistent state.
+
+This umbrella repository will be updated automatically by a bot (running on
+notice from a webhook on every push, and periodically) on a per commit basis: a
+single commit in the umbrella repository would match a single commit in a
+sub-project.
+
+Living Downstream
+^^^^^^^^^^^^^^^^^
+
+Downstream SVN users can use the read/write SVN bridges with the following
+caveats:
+
+ * Be prepared for a one-time change to the upstream revision numbers.
+ * The upstream sub-project revision numbers will no longer be in sync.
+
+Downstream Git users can continue without any major changes, with the minor
+change of upstreaming using `git push` instead of `git svn dcommit`.
+
+Git users also have the option of adopting an umbrella repository downstream.
+The tooling for the upstream umbrella can easily be reused for downstream needs,
+incorporating extra sub-projects and branching in parallel with sub-project
+branches.
+
+Multirepo Preview
+^^^^^^^^^^^^^^^^^
+
+As a preview (disclaimer: this rough prototype, not polished and not
+representative of the final solution), you can look at the following:
+
+  * Repository: https://github.com/llvm-beanz/llvm-submodules
+  * Update bot: http://beanz-bot.com:8180/jenkins/job/submodule-update/
+
+Concerns
+^^^^^^^^
+
+ * Because GitHub does not allow server-side hooks, and because there is no
+   "push timestamp" in Git, the umbrella repository sequence isn't totally
+   exact: commits from different repositories pushed around the same time can
+   appear in different orders. However, we don't expect it to be the common case
+   or to cause serious issues in practice.
+ * You can't have a single cross-projects commit that would update both LLVM and
+   other sub-projects (something that can be achieved now). It would be possible
+   to establish a protocol whereby users add a special token to their commit
+   messages that causes the umbrella repo's updater bot to group all of them
+   into a single revision.
+ * Another option is to group commits that were pushed closely enough together
+   in the umbrella repository. This has the advantage of allowing cross-project
+   commits, and is less sensitive to mis-ordering commits. However, this has the
+   potential to group unrelated commits together, especially if the bot goes
+   down and needs to catch up.
+ * This variant relies on heavier tooling. But the current prototype shows that
+   it is not out-of-reach.
+ * Submodules don't have a good reputation / are complicating the command line.
+   However, in the proposed setup, a regular developer will seldom interact with
+   submodules directly, and certainly never update them.
+ * Refactoring across projects is not friendly: taking some functions from clang
+   to make it part of a utility in libSupport wouldn't carry the history of the
+   code in the llvm repo, preventing recursively applying `git blame` for
+   instance. However, this is not very different than how most people are
+   Interacting with the repository today, by splitting such change in multiple
+   commits.
+
+Workflows
+^^^^^^^^^
+
+ * :ref:`Checkout/Clone a Single Project, without Commit Access <workflow-checkout-commit>`.
+ * :ref:`Checkout/Clone a Single Project, with Commit Access <workflow-multicheckout-nocommit>`.
+ * :ref:`Checkout/Clone Multiple Projects, with Commit Access <workflow-multicheckout-multicommit>`.
+ * :ref:`Commit an API Change in LLVM and Update the Sub-projects <workflow-cross-repo-commit>`.
+ * :ref:`Branching/Stashing/Updating for Local Development or Experiments <workflow-multi-branching>`.
+ * :ref:`Bisecting <workflow-multi-bisecting>`.
+
+Monorepo Variant
+----------------
+
+This variant recommends moving all LLVM sub-projects to a single Git repository,
+similar to https://github.com/llvm-project/llvm-project.
+This would mimic an export of the current SVN repository, with each sub-project
+having its own top-level directory.
+Not all sub-projects are used for building toolchains. In practice, www/
+and test-suite/ will probably stay out of the monorepo.
+
+Putting all sub-projects in a single checkout makes cross-project refactoring
+naturally simple:
+
+ * New sub-projects can be trivially split out for better reuse and/or layering
+   (e.g., to allow libSupport and/or LIT to be used by runtimes without adding a
+   dependency on LLVM).
+ * Changing an API in LLVM and upgrading the sub-projects will always be done in
+   a single commit, designing away a common source of temporary build breakage.
+ * Moving code across sub-project (during refactoring for instance) in a single
+   commit enables accurate `git blame` when tracking code change history.
+ * Tooling based on `git grep` works natively across sub-projects, allowing to
+   easier find refactoring opportunities across projects (for example reusing a
+   datastructure initially in LLDB by moving it into libSupport).
+ * Having all the sources present encourages maintaining the other sub-projects
+   when changing API.
+
+Finally, the monorepo maintains the property of the existing SVN repository that
+the sub-projects move synchronously, and a single revision number (or commit
+hash) identifies the state of the development across all projects.
+
+.. _build_single_project:
+
+Building a single sub-project
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Nobody will be forced to build unnecessary projects.  The exact structure
+is TBD, but making it trivial to configure builds for a single sub-project
+(or a subset of sub-projects) is a hard requirement.
+
+As an example, it could look like the following::
+
+  mkdir build && cd build
+  # Configure only LLVM (default)
+  cmake path/to/monorepo
+  # Configure LLVM and lld
+  cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=lld
+  # Configure LLVM and clang
+  cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=clang
+
+.. _git-svn-mirror:
+
+Read/write sub-project mirrors
+------------------------------
+
+With the Monorepo, the existing single-subproject mirrors (e.g.
+http://llvm.org/git/compiler-rt.git) with git-svn read-write access would
+continue to be maintained: developers would continue to be able to use the
+existing single-subproject git repositories as they do today, with *no changes
+to workflow*. Everything (git fetch, git svn dcommit, etc.) could continue to
+work identically to how it works today. The monorepo can be set-up such that the
+SVN revision number matches the SVN revision in the GitHub SVN-bridge.
+
+Living Downstream
+^^^^^^^^^^^^^^^^^
+
+Downstream SVN users can use the read/write SVN bridge. The SVN revision
+number can be preserved in the monorepo, minimizing the impact.
+
+Downstream Git users can continue without any major changes, by using the
+git-svn mirrors on top of the SVN bridge.
+
+Git users can also work upstream with monorepo even if their downstream
+fork has split repositories.  They can apply patches in the appropriate
+subdirectories of the monorepo using, e.g., `git am --directory=...`, or
+plain `diff` and `patch`.
+
+Alternatively, Git users can migrate their own fork to the monorepo.  As a
+demonstration, we've migrated the "CHERI" fork to the monorepo in two ways:
+
+ * Using a script that rewrites history (including merges) so that it looks
+   like the fork always lived in the monorepo [LebarCHERI]_.  The upside of
+   this is when you check out an old revision, you get a copy of all llvm
+   sub-projects at a consistent revision.  (For instance, if it's a clang
+   fork, when you check out an old revision you'll get a consistent version
+   of llvm proper.)  The downside is that this changes the fork's commit
+   hashes.
+
+ * Merging the fork into the monorepo [AminiCHERI]_.  This preserves the
+   fork's commit hashes, but when you check out an old commit you only get
+   the one sub-project.
+
+Monorepo Preview
+^^^^^^^^^^^^^^^^^
+
+As a preview (disclaimer: this rough prototype, not polished and not
+representative of the final solution), you can look at the following:
+
+  * Full Repository: https://github.com/joker-eph/llvm-project
+  * Single sub-project view with *SVN write access* to the full repo:
+    https://github.com/joker-eph/compiler-rt
+
+Concerns
+^^^^^^^^
+
+ * Using the monolithic repository may add overhead for those contributing to a
+   standalone sub-project, particularly on runtimes like libcxx and compiler-rt
+   that don't rely on LLVM; currently, a fresh clone of libcxx is only 15MB (vs.
+   1GB for the monorepo), and the commit rate of LLVM may cause more frequent
+   `git push` collisions when upstreaming. Affected contributors can continue to
+   use the SVN bridge or the single-subproject Git mirrors with git-svn for
+   read-write.
+ * Using the monolithic repository may add overhead for those *integrating* a
+   standalone sub-project, even if they aren't contributing to it, due to the
+   same disk space concern as the point above. The availability of the
+   sub-project Git mirror addesses this, even without SVN access.
+ * Preservation of the existing read/write SVN-based workflows relies on the
+   GitHub SVN bridge, which is an extra dependency.  Maintaining this locks us
+   into GitHub and could restrict future workflow changes.
+
+Workflows
+^^^^^^^^^
+
+ * :ref:`Checkout/Clone a Single Project, without Commit Access <workflow-checkout-commit>`.
+ * :ref:`Checkout/Clone a Single Project, with Commit Access <workflow-monocheckout-nocommit>`.
+ * :ref:`Checkout/Clone Multiple Projects, with Commit Access <workflow-monocheckout-multicommit>`.
+ * :ref:`Commit an API Change in LLVM and Update the Sub-projects <workflow-cross-repo-commit>`.
+ * :ref:`Branching/Stashing/Updating for Local Development or Experiments <workflow-mono-branching>`.
+ * :ref:`Bisecting <workflow-mono-bisecting>`.
+
+Multi/Mono Hybrid Variant
+-------------------------
+
+This variant recommends moving only the LLVM sub-projects that are *rev-locked*
+to LLVM into a monorepo (clang, lld, lldb, ...), following the multirepo
+proposal for the rest.  While neither variant recommends combining sub-projects
+like www/ and test-suite/ (which are completely standalone), this goes further
+and keeps sub-projects like libcxx and compiler-rt in their own distinct
+repositories.
+
+Concerns
+^^^^^^^^
+
+ * This has most disadvantages of multirepo and monorepo, without bringing many
+   of the advantages.
+ * Downstream have to upgrade to the monorepo structure, but only partially. So
+   they will keep the infrastructure to integrate the other separate
+   sub-projects.
+ * All projects that use LIT for testing are effectively rev-locked to LLVM.
+   Furthermore, some runtimes (like compiler-rt) are rev-locked with Clang.
+   It's not clear where to draw the lines.
+
+
+Workflow Before/After
+=====================
+
+This section goes through a few examples of workflows, intended to illustrate
+how end-users or developers would interact with the repository for
+various use-cases.
+
+.. _workflow-checkout-commit:
+
+Checkout/Clone a Single Project, without Commit Access
+------------------------------------------------------
+
+Except the URL, nothing changes. The possibilities today are::
+
+  svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
+  # or with Git
+  git clone http://llvm.org/git/llvm.git
+
+After the move to GitHub, you would do either::
+
+  git clone https://github.com/llvm-project/llvm.git
+  # or using the GitHub svn native bridge
+  svn co https://github.com/llvm-project/llvm/trunk
+
+The above works for both the monorepo and the multirepo, as we'll maintain the
+existing read-only views of the individual sub-projects.
+
+Checkout/Clone a Single Project, with Commit Access
+---------------------------------------------------
+
+Currently
+^^^^^^^^^
+
+::
+
+  # direct SVN checkout
+  svn co https://user@llvm.org/svn/llvm-project/llvm/trunk llvm
+  # or using the read-only Git view, with git-svn
+  git clone http://llvm.org/git/llvm.git
+  cd llvm
+  git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username>
+  git config svn-remote.svn.fetch :refs/remotes/origin/master
+  git svn rebase -l  # -l avoids fetching ahead of the git mirror.
+
+Commits are performed using `svn commit` or with the sequence `git commit` and
+`git svn dcommit`.
+
+.. _workflow-multicheckout-nocommit:
+
+Multirepo Variant
+^^^^^^^^^^^^^^^^^
+
+With the multirepo variant, nothing changes but the URL, and commits can be
+performed using `svn commit` or `git commit` and `git push`::
+
+  git clone https://github.com/llvm/llvm.git llvm
+  # or using the GitHub svn native bridge
+  svn co https://github.com/llvm/llvm/trunk/ llvm
+
+.. _workflow-monocheckout-nocommit:
+
+Monorepo Variant
+^^^^^^^^^^^^^^^^
+
+With the monorepo variant, there are a few options, depending on your
+constraints. First, you could just clone the full repository::
+
+  git clone https://github.com/llvm/llvm-projects.git llvm
+  # or using the GitHub svn native bridge
+  svn co https://github.com/llvm/llvm-projects/trunk/ llvm
+
+At this point you have every sub-project (llvm, clang, lld, lldb, ...), which
+:ref:`doesn't imply you have to build all of them <build_single_project>`. You
+can still build only compiler-rt for instance. In this way it's not different
+from someone who would check out all the projects with SVN today.
+
+You can commit as normal using `git commit` and `git push` or `svn commit`, and
+read the history for a single project (`git log libcxx` for example).
+
+Secondly, there are a few options to avoid checking out all the sources.
+
+**Using the GitHub SVN bridge**
+
+The GitHub SVN native bridge allows to checkout a subdirectory directly:
+
+  svn co https://github.com/llvm/llvm-projects/trunk/compiler-rt compiler-rt  —username=...
+
+This checks out only compiler-rt and provides commit access using "svn commit",
+in the same way as it would do today.
+
+**Using a Subproject Git Nirror**
+
+You can use *git-svn* and one of the sub-project mirrors::
+
+  # Clone from the single read-only Git repo
+  git clone http://llvm.org/git/llvm.git
+  cd llvm
+  # Configure the SVN remote and initialize the svn metadata
+  $ git svn init https://github.com/joker-eph/llvm-project/trunk/llvm —username=...
+  git config svn-remote.svn.fetch :refs/remotes/origin/master
+  git svn rebase -l
+
+In this case the repository contains only a single sub-project, and commits can
+be made using `git svn dcommit`, again exactly as we do today.
+
+**Using a Sparse Checkouts**
+
+You can hide the other directories using a Git sparse checkout::
+
+  git config core.sparseCheckout true
+  echo /compiler-rt > .git/info/sparse-checkout
+  git read-tree -mu HEAD
+
+The data for all sub-projects is still in your `.git` directory, but in your
+checkout, you only see `compiler-rt`.
+Before you push, you'll need to fetch and rebase (`git pull --rebase`) as
+usual.
+
+Note that when you fetch you'll likely pull in changes to sub-projects you don't
+care about. If you are using spasre checkout, the files from other projects
+won't appear on your disk. The only effect is that your commit hash changes.
+
+You can check whether the changes in the last fetch are relevant to your commit
+by running::
+
+  git log origin/master@{1}..origin/master -- libcxx
+
+This command can be hidden in a script so that `git llvmpush` would perform all
+these steps, fail only if such a dependent change exists, and show immediately
+the change that prevented the push. An immediate repeat of the command would
+(almost) certainly result in a successful push.
+Note that today with SVN or git-svn, this step is not possible since the
+"rebase" implicitly happens while committing (unless a conflict occurs).
+
+Checkout/Clone Multiple Projects, with Commit Access
+----------------------------------------------------
+
+Let's look how to assemble llvm+clang+libcxx at a given revision.
+
+Currently
+^^^^^^^^^
+
+::
+
+  svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm -r $REVISION
+  cd llvm/tools
+  svn co http://llvm.org/svn/llvm-project/clang/trunk clang -r $REVISION
+  cd ../projects
+  svn co http://llvm.org/svn/llvm-project/libcxx/trunk libcxx -r $REVISION
+
+Or using git-svn::
+
+  git clone http://llvm.org/git/llvm.git
+  cd llvm/
+  git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username>
+  git config svn-remote.svn.fetch :refs/remotes/origin/master
+  git svn rebase -l
+  git checkout `git svn find-rev -B r258109`
+  cd tools
+  git clone http://llvm.org/git/clang.git
+  cd clang/
+  git svn init https://llvm.org/svn/llvm-project/clang/trunk --username=<username>
+  git config svn-remote.svn.fetch :refs/remotes/origin/master
+  git svn rebase -l
+  git checkout `git svn find-rev -B r258109`
+  cd ../../projects/
+  git clone http://llvm.org/git/libcxx.git
+  cd libcxx
+  git svn init https://llvm.org/svn/llvm-project/libcxx/trunk --username=<username>
+  git config svn-remote.svn.fetch :refs/remotes/origin/master
+  git svn rebase -l
+  git checkout `git svn find-rev -B r258109`
+
+Note that the list would be longer with more sub-projects.
+
+.. _workflow-multicheckout-multicommit:
+
+Multirepo Variant
+^^^^^^^^^^^^^^^^^
+
+With the multirepo variant, the umbrella repository will be used. This is
+where the mapping from a single revision number to the individual repositories
+revisions is stored.::
+
+  git clone https://github.com/llvm-beanz/llvm-submodules
+  cd llvm-submodules
+  git checkout $REVISION
+  git submodule init
+  git submodule update clang llvm libcxx
+  # the list of sub-project is optional, `git submodule update` would get them all.
+
+At this point the clang, llvm, and libcxx individual repositories are cloned
+and stored alongside each other. There are CMake flags to describe the directory
+structure; alternatively, you can just symlink `clang` to `llvm/tools/clang`,
+etc.
+
+Another option is to checkout repositories based on the commit timestamp::
+
+  git checkout `git rev-list -n 1 --before="2009-07-27 13:37" master`
+
+.. _workflow-monocheckout-multicommit:
+
+Monorepo Variant
+^^^^^^^^^^^^^^^^
+
+The repository contains natively the source for every sub-projects at the right
+revision, which makes this straightforward::
+
+  git clone https://github.com/llvm/llvm-projects.git llvm-projects
+  cd llvm-projects
+  git checkout $REVISION
+
+As before, at this point clang, llvm, and libcxx are stored in directories
+alongside each other.
+
+.. _workflow-cross-repo-commit:
+
+Commit an API Change in LLVM and Update the Sub-projects
+--------------------------------------------------------
+
+Today this is possible, even though not common (at least not documented) for
+subversion users and for git-svn users. For example, few Git users try to update
+LLD or Clang in the same commit as they change an LLVM API.
+
+The multirepo variant does not address this: one would have to commit and push
+separately in every individual repository. It would be possible to establish a
+protocol whereby users add a special token to their commit messages that causes
+the umbrella repo's updater bot to group all of them into a single revision.
+
+The monorepo variant handles this natively.
+
+Branching/Stashing/Updating for Local Development or Experiments
+----------------------------------------------------------------
+
+Currently
+^^^^^^^^^
+
+SVN does not allow this use case, but developers that are currently using
+git-svn can do it. Let's look in practice what it means when dealing with
+multiple sub-projects.
+
+To update the repository to tip of trunk::
+
+  git pull
+  cd tools/clang
+  git pull
+  cd ../../projects/libcxx
+  git pull
+
+To create a new branch::
+
+  git checkout -b MyBranch
+  cd tools/clang
+  git checkout -b MyBranch
+  cd ../../projects/libcxx
+  git checkout -b MyBranch
+
+To switch branches::
+
+  git checkout AnotherBranch
+  cd tools/clang
+  git checkout AnotherBranch
+  cd ../../projects/libcxx
+  git checkout AnotherBranch
+
+.. _workflow-multi-branching:
+
+Multirepo Variant
+^^^^^^^^^^^^^^^^^
+
+The multirepo works the same as the current Git workflow: every command needs
+to be applied to each of the individual repositories.
+However, the umbrella repository makes this easy using `git submodule foreach`
+to replicate a command on all the individual repositories (or submodules
+in this case):
+
+To create a new branch::
+
+  git submodule foreach git checkout -b MyBranch
+
+To switch branches::
+
+  git submodule foreach git checkout AnotherBranch
+
+.. _workflow-mono-branching:
+
+Monorepo Variant
+^^^^^^^^^^^^^^^^
+
+Regular Git commands are sufficient, because everything is in a single
+repository:
+
+To update the repository to tip of trunk::
+
+  git pull
+
+To create a new branch::
+
+  git checkout -b MyBranch
+
+To switch branches::
+
+  git checkout AnotherBranch
+
+Bisecting
+---------
+
+Assuming a developer is looking for a bug in clang (or lld, or lldb, ...).
+
+Currently
+^^^^^^^^^
+
+SVN does not have builtin bisection support, but the single revision across
+sub-projects makes it possible to script around.
+
+Using the existing Git read-only view of the repositories, it is possible to use
+the native Git bisection script over the llvm repository, and use some scripting
+to synchronize the clang repository to match the llvm revision.
+
+.. _workflow-multi-bisecting:
+
+Multirepo Variant
+^^^^^^^^^^^^^^^^^
+
+With the multi-repositories variant, the cross-repository synchronization is
+achieved using the umbrella repository. This repository contains only
+submodules for the other sub-projects. The native Git bisection can be used on
+the umbrella repository directly. A subtlety is that the bisect script itself
+needs to make sure the submodules are updated accordingly.
+
+For example, to find which commit introduces a regression where clang-3.9
+crashes but not clang-3.8 passes, one should be able to simply do::
+
+  git bisect start release_39 release_38
+  git bisect run ./bisect_script.sh
+
+With the `bisect_script.sh` script being::
+
+  #!/bin/sh
+  cd $UMBRELLA_DIRECTORY
+  git submodule update llvm clang libcxx #....
+  cd $BUILD_DIR
+
+  ninja clang || exit 125   # an exit code of 125 asks "git bisect"
+                            # to "skip" the current commit
+
+  ./bin/clang some_crash_test.cpp
+
+When the `git bisect run` command returns, the umbrella repository is set to
+the state where the regression is introduced. The commit diff in the umbrella
+indicate which submodule was updated, and the last commit in this sub-projects
+is the one that the bisect found.
+
+.. _workflow-mono-bisecting:
+
+Monorepo Variant
+^^^^^^^^^^^^^^^^
+
+Bisecting on the monorepo is straightforward, and very similar to the above,
+except that the bisection script does not need to include the
+`git submodule update` step.
+
+The same example, finding which commit introduces a regression where clang-3.9
+crashes but not clang-3.8 passes, will look like::
+
+  git bisect start release_39 release_38
+  git bisect run ./bisect_script.sh
+
+With the `bisect_script.sh` script being::
+
+  #!/bin/sh
+  cd $BUILD_DIR
+
+  ninja clang || exit 125   # an exit code of 125 asks "git bisect"
+                            # to "skip" the current commit
+
+  ./bin/clang some_crash_test.cpp
+
+Also, since the monorepo handles commits update across multiple projects, you're
+less like to encounter a build failure where a commit change an API in LLVM and
+another later one "fixes" the build in clang.
+
+
+References
+==========
+
+.. [LattnerRevNum] Chris Lattner, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041739.html
+.. [TrickRevNum] Andrew Trick, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041721.html
+.. [JSonnRevNum] Joerg Sonnenberg, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041688.html
+.. [TorvaldRevNum] Linus Torvald, http://git.661346.n2.nabble.com/Git-commit-generation-numbers-td6584414.html
+.. [MatthewsRevNum] Chris Matthews, http://lists.llvm.org/pipermail/cfe-dev/2016-July/049886.html
+.. [submodules] Git submodules, https://git-scm.com/book/en/v2/Git-Tools-Submodules)
+.. [statuschecks] GitHub status-checks, https://help.github.com/articles/about-required-status-checks/
+.. [LebarCHERI] Port *CHERI* to a single repository rewriting history, http://lists.llvm.org/pipermail/llvm-dev/2016-July/102787.html
+.. [AminiCHERI] Port *CHERI* to a single repository preserving history, http://lists.llvm.org/pipermail/llvm-dev/2016-July/102804.html

Removed: llvm/trunk/docs/Proposals/GitHubSubMod.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/Proposals/GitHubSubMod.rst?rev=284076&view=auto
==============================================================================
--- llvm/trunk/docs/Proposals/GitHubSubMod.rst (original)
+++ llvm/trunk/docs/Proposals/GitHubSubMod.rst (removed)
@@ -1,273 +0,0 @@
-===============================================
-Moving LLVM Projects to GitHub with Sub-Modules
-===============================================
-
-Introduction
-============
-
-This is a proposal to move our current revision control system from our own
-hosted Subversion to GitHub. Below are the financial and technical arguments as
-to why we need such a move and how will people (and validation infrastructure)
-continue to work with a Git-based LLVM.
-
-There will be a survey pointing at this document when we'll know the community's
-reaction and, if we collectively decide to move, the time-frames. Be sure to make
-your views count.
-
-Essentially, the proposal is divided in the following parts:
-
-* Outline of the reasons to move to Git and GitHub
-* Description on what the work flow will look like (compared to SVN)
-* Remaining issues and potential problems
-* The proposed migration plan
-
-Why Git, and Why GitHub?
-========================
-
-Why move at all?
-----------------
-
-The strongest reason for the move, and why this discussion started in the first
-place, is that we currently host our own Subversion server and Git mirror in a
-voluntary basis. The LLVM Foundation sponsors the server and provides limited
-support, but there is only so much it can do.
-
-The volunteers are not Sysadmins themselves, but compiler engineers that happen
-to know a thing or two about hosting servers. We also don't have 24/7 support,
-and we sometimes wake up to see that continuous integration is broken because
-the SVN server is either down or unresponsive.
-
-With time and money, the foundation and volunteers could improve our services,
-implement more functionality and provide around the clock support, so that we
-can have a first class infrastructure with which to work. But the cost is not
-small, both in money and time invested.
-
-On the other hand, there are multiple services out there (GitHub, GitLab,
-BitBucket among others) that offer that same service (24/7 stability, disk space,
-Git server, code browsing, forking facilities, etc) for the very affordable price
-of *free*.
-
-Why Git?
---------
-
-Most new coders nowadays start with Git. A lot of them have never used SVN, CVS
-or anything else. Websites like GitHub have changed the landscape of open source
-contributions, reducing the cost of first contribution and fostering
-collaboration.
-
-Git is also the version control most LLVM developers use. Despite the sources
-being stored in an SVN server, most people develop using the Git-SVN integration,
-and that shows that Git is not only more powerful than SVN, but people have
-resorted to using a bridge because its features are now indispensable to their
-internal and external workflows.
-
-In essence, Git allows you to:
-
-* Commit, squash, merge, fork locally without any penalty to the server
-* Add as many branches as necessary to allow for multiple threads of development
-* Collaborate with peers directly, even without access to the Internet
-* Have multiple trees without multiplying disk space.
-
-In addition, because Git seems to be replacing every project's version control
-system, there are many more tools that can use Git's enhanced feature set, so
-new tooling is much more likely to support Git first (if not only), than any
-other version control system.
-
-Why GitHub?
------------
-
-GitHub, like GitLab and BitBucket, provide free code hosting for open source
-projects. Essentially, they will completely replace *all* the infrastructure that
-we have today that serves code repository, mirroring, user control, etc.
-
-They also have a dedicated team to monitor, migrate, improve and distribute the
-contents of the repositories depending on region and load. A level of quality
-that we'd never have without spending money that would be better spent elsewhere,
-for example development meetings, sponsoring disadvantaged people to work on
-compilers and foster diversity and equality in our community.
-
-GitHub has the added benefit that we already have a presence there. Many
-developers use it already, and the mirror from our current repository is already
-set up.
-
-Furthermore, GitHub has an *SVN view* (https://github.com/blog/626-announcing-svn-support)
-where people that still have/want to use SVN infrastructure and tooling can
-slowly migrate or even stay working as if it was an SVN repository (including
-read-write access).
-
-So, any of the three solutions solve the cost and maintenance problem, but GitHub
-has two additional features that would be beneficial to the migration plan as
-well as the community already settled there.
-
-
-What will the new workflow look like
-====================================
-
-In order to move version control, we need to make sure that we get all the
-benefits with the least amount of problems. That's why the migration plan will
-be slow, one step at a time, and we'll try to make it look as close as possible
-to the current style without impacting the new features we want.
-
-Each LLVM project will continue to be hosted as separate GitHub repository
-under a single GitHub organisation. Users can continue to choose to use either
-SVN or Git to access the repositories to suit their current workflow.
-
-In addition, we'll create a repository that will mimic our current *linear
-history* repository. The most accepted proposal, then, was to have an umbrella
-project that will contain *sub-modules* (https://git-scm.com/book/en/v2/Git-Tools-Submodules)
-of all the LLVM projects and nothing else.
-
-This repository can be checked out on its own, in order to have *all* LLVM
-projects in a single check-out, as many people have suggested, but it can also
-only hold the references to the other projects, and be used for the sole purpose
-of understanding the *sequence* in which commits were added by using the
-``git rev-list --count hash`` or ``git describe hash`` commands.
-
-One example of such a repository is Takumi's llvm-project-submodule
-(https://github.com/chapuni/llvm-project-submodule), which when checked out,
-will have the references to all sub-modules but not check them out, so one will
-need to *init* the module manually. This will allow the *exact* same behaviour
-as checking out individual SVN repositories, as it will keep the correct linear
-history.
-
-There is no need to additional tags, flags and properties, or external
-services controlling the history, since both SVN and *git rev-list* can already
-do that on their own.
-
-We will need additional server hooks to avoid non-fast-forwards commits (ex.
-merges, forced pushes, etc) in order to keep the linearity of the history.
-
-The three types hooks to be implemented are:
-
-* Status Checks: By placing status checks on a protected branch, we can guarantee
-  that the history is kept linear and sane at all times, on all repositories.
-  See: https://help.github.com/articles/about-required-status-checks/
-* Umbrella updates: By using GitHub web hooks, we can update a small web-service
-  inside LLVM's own infrastructure to update the umbrella project remotely. The
-  maintenance of this service will be lower than the current SVN maintenance and
-  the scope of its failures will be less severe.
-  See: https://developer.github.com/webhooks/
-* Commits email update: By adding an email web hook, we can make every push show
-  in the lists, allowing us to retain history and do post-commit reviews.
-  See: https://help.github.com/articles/managing-notifications-for-pushes-to-a-repository/
-
-Access will be transferred one-to-one to GitHub accounts for everyone that already
-has commit access to our current repository. Those who don't have accounts will
-have to create one in order to continue contributing to the project. In the
-future, people only need to provide their GitHub accounts to be granted access.
-
-In a nutshell:
-
-* The projects' repositories will remain identical, with a new address (GitHub).
-* They'll continue to have SVN access (Read-Write), but will also gain Git RW access.
-* The linear history can still be accessed in the (RO) submodule meta project.
-* Individual projects' history will be local (ie. not interlaced with the other
-  projects, as the current SVN repos are), and we need the umbrella project
-  (using submodules) to have the same view as we had in SVN.
-
-Additionally, each repository will have the following server hooks:
-
-* Pre-commit hooks to stop people from applying non-fast-forward merges
-* Webhook to update the umbrella project (via buildbot or web services)
-* Email hook to each commits list (llvm-commit, cfe-commit, etc)
-
-Essentially, we're adding Git RW access in addition to the already existing
-structure, with all the additional benefits of it being in GitHub.
-
-Example of a working version:
-
-* Repository: https://github.com/llvm-beanz/llvm-submodules
-* Update bot: http://beanz-bot.com:8180/jenkins/job/submodule-update/
-
-What will *not* be changed
---------------------------
-
-This is a change of version control system, not the whole infrastructure. There
-are plans to replace our current tools (review, bugs, documents), but they're
-all orthogonal to this proposal.
-
-We'll also be keeping the buildbots (and migrating them to use Git) as well as
-LNT, and any other system that currently provides value upstream.
-
-Any discussion regarding those tools are out of scope in this proposal.
-
-Remaining questions and problems
-================================
-
-1. How much the SVN view emulates and how much it'll break tools/CI?
-
-For this one, we'll need people that will have problems in that area to tell
-us what's wrong and how to help them fix it.
-
-We also recommend people and companies to migrate to Git, for its many other
-additional benefits.
-
-2. Which tools will need changing?
-
-LNT may break, since it relies on SVN's history. We can continue to
-use LNT with the SVN-View, but it would be best to move it to Git once and for
-all.
-
-The LLVMLab bisect tool will also be affected and will need adjusting. As with
-LNT, it should be fine to use GitHub's SVN view, but changing it to work on Git
-will be required in the long term.
-
-Phabricator will also need to change its configuration to point at the GitHub
-repositories, but since it already works with Git, this will be a trivial change.
-
-Migration Plan
-==============
-
-If we decide to move, we'll have to set a date for the process to begin.
-
-As usual, we should be announcing big changes in one release to happen in the
-next one. But since this won't impact external users (if they rely on our source
-release tarballs), we don't necessarily have to.
-
-We will have to make sure all the *problems* reported are solved before the
-final push. But we can start all non-binding processes (like mirroring to GitHub
-and testing the SVN interface in it) before any hard decision.
-
-Here's a proposed plan:
-
-STEP #1 : Pre Move
-
-0. Update docs to mention the move, so people are aware the it's going on.
-1. Register an official GitHub project with the LLVM foundation.
-2. Setup another (read-only) mirror of llvm.org/git at this GitHub project,
-   adding all necessary hooks to avoid broken history (merge, dates, pushes), as
-   well as a webhook to update the umbrella project (see below).
-3. Make sure we have an llvm-project (with submodules) setup in the official
-   account, with all necessary hooks (history, update, merges).
-4. Make sure bisecting with llvm-project works.
-5. Make sure no one has any other blocker.
-
-STEP #2 : Git Move
-
-6. Update the buildbots to pick up updates and commits from the official git
-   repository.
-7. Update Phabricator to pick up commits from the official git repository.
-8. Tell people living downstream to pick up commits from the official git
-   repository.
-9. Give things time to settle. We could play some games like disabling the SVN
-   repository for a few hours on purpose so that people can test that their
-   infrastructure has really become independent of the SVN repository.
-
-Until this point nothing has changed for developers, it will just
-boil down to a lot of work for buildbot and other infrastructure
-owners.
-
-Once all dependencies are cleared, and all problems have been solved:
-
-STEP #3: Write Access Move
-
-10. Collect peoples GitHub account information, adding them to the project.
-11. Switch SVN repository to read-only and allow pushes to the GitHub repository.
-12. Mirror Git to SVN.
-
-STEP #4 : Post Move
-
-13. Archive the SVN repository, if GitHub's SVN is good enough.
-14. Review and update *all* LLVM documentation.
-15. Review website links pointing to viewvc/klaus/phab etc. to point to GitHub
-    instead.

Modified: llvm/trunk/docs/index.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/index.rst?rev=284077&r1=284076&r2=284077&view=diff
==============================================================================
--- llvm/trunk/docs/index.rst (original)
+++ llvm/trunk/docs/index.rst Wed Oct 12 18:02:02 2016
@@ -510,13 +510,13 @@ can be better.
    :hidden:
 
    CodeOfConduct
-   Proposals/GitHubSubMod
+   Proposals/GitHubMove
 
 :doc:`CodeOfConduct`
    Proposal to adopt a code of conduct on the LLVM social spaces (lists, events,
    IRC, etc).
 
-:doc:`Proposals/GitHubSubMod`
+:doc:`Proposals/GitHubMove`
    Proposal to move from SVN/Git to GitHub.
 
 




More information about the llvm-commits mailing list