[llvm] r353713 - Add recipes for migrating downstream branches of git mirrors

David Greene via llvm-commits llvm-commits at lists.llvm.org
Mon Feb 11 07:40:02 PST 2019


Author: greened
Date: Mon Feb 11 07:40:02 2019
New Revision: 353713

URL: http://llvm.org/viewvc/llvm-project?rev=353713&view=rev
Log:
Add recipes for migrating downstream branches of git mirrors

Add some common recipes for downstream users developing on top of the
existing git mirrors. These instructions show how to migrate local
branches to the monorepo.

Differential Revision: https://reviews.llvm.org/D56550


Modified:
    llvm/trunk/docs/Proposals/GitHubMove.rst

Modified: llvm/trunk/docs/Proposals/GitHubMove.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/Proposals/GitHubMove.rst?rev=353713&r1=353712&r2=353713&view=diff
==============================================================================
--- llvm/trunk/docs/Proposals/GitHubMove.rst (original)
+++ llvm/trunk/docs/Proposals/GitHubMove.rst Mon Feb 11 07:40:02 2019
@@ -852,6 +852,540 @@ Also, since the monorepo handles commits
 less like to encounter a build failure where a commit change an API in LLVM and
 another later one "fixes" the build in clang.
 
+Moving Local Branches to the Monorepo
+=====================================
+
+Suppose you have been developing against the existing LLVM git
+mirrors.  You have one or more git branches that you want to migrate
+to the "final monorepo".
+
+The simplest way to migrate such branches is with the
+``migrate-downstream-fork.py`` tool at
+https://github.com/jyknight/llvm-git-migration.
+
+Basic migration
+---------------
+
+Basic instructions for ``migrate-downstream-fork.py`` are in the
+Python script and are expanded on below to a more general recipe::
+
+  # Make a repository which will become your final local mirror of the
+  # monorepo.
+  mkdir my-monorepo
+  git -C my-monorepo init
+
+  # Add a remote to the monorepo.
+  git -C my-monorepo remote add upstream/monorepo https://github.com/llvm/llvm-project.git
+
+  # Add remotes for each git mirror you use, from upstream as well as
+  # your local mirror.  All projects are listed here but you need only
+  # import those for which you have local branches.
+  my_projects=( clang
+                clang-tools-extra
+                compiler-rt
+                debuginfo-tests
+                libcxx
+                libcxxabi
+                libunwind
+                lld
+                lldb
+                llvm
+                openmp
+                polly )
+  for p in ${my_projects[@]}; do
+    git -C my-monorepo remote add upstream/split/${p} https://github.com/llvm-mirror/${p}.git
+    git -C my-monorepo remote add local/split/${p} https://my.local.mirror.org/${p}.git
+  done
+
+  # Pull in all the commits.
+  git -C my-monorepo fetch --all
+
+  # Run migrate-downstream-fork to rewrite local branches on top of
+  # the upstream monorepo.
+  (
+     cd my-monorepo
+     migrate-downstream-fork.py \
+       refs/remotes/local \
+       refs/tags \
+       --new-repo-prefix=refs/remotes/upstream/monorepo \
+       --old-repo-prefix=refs/remotes/upstream/split \
+       --source-kind=split \
+       --revmap-out=monorepo-map.txt
+  )
+
+  # Octopus-merge the resulting local split histories to unify them.
+
+  # Assumes local work on local split mirrors is on master (and
+  # upstream is presumably represented by some other branch like
+  # upstream/master).
+  my_local_branch="master"
+
+  git -C my-monorepo branch --no-track local/octopus/master \
+    $(git -C my-monorepo merge-base refs/remotes/upstream/monorepo/master \
+                                    refs/remotes/local/split/llvm/${my_local_branch})
+  git -C my-monorepo checkout local/octopus/${my_local_branch}
+
+  subproject_branches=()
+  for p in ${my_projects[@]}; do
+    subproject_branch=${p}/local/monorepo/${my_local_branch}
+    git -C my-monorepo branch ${subproject_branch} \
+      refs/remotes/local/split/${p}/${my_local_branch}
+    if [[ "${p}" != "llvm" ]]; then
+      subproject_branches+=( ${subproject_branch} )
+    fi
+  done
+
+  git -C my-monorepo merge ${subproject_branches[@]}
+
+  for p in ${my_projects[@]}; do
+    subproject_branch=${p}/local/monorepo/${my_local_branch}
+    git -C my-monorepo branch -d ${subproject_branch}
+  done
+
+  # Create local branches for upstream monorepo branches.
+  for ref in $(git -C my-monorepo for-each-ref --format="%(refname)" \
+                   refs/remotes/upstream/monorepo); do
+    upstream_branch=${ref#refs/remotes/upstream/monorepo/}
+    git -C my-monorepo branch upstream/${upstream_branch} ${ref}
+  done
+
+The above gets you to a state like the following::
+
+  U1 - U2 - U3 <- upstream/master
+    \   \    \
+     \   \    - Llld1 - Llld2 -
+      \   \                    \
+       \   - Lclang1 - Lclang2-- Lmerge <- local/octopus/master
+        \                      /
+         - Lllvm1 - Lllvm2-----
+
+Each branched component has its branch rewritten on top of the
+monorepo and all components are unified by a giant octopus merge.
+
+If additional active local branches need to be preserved, the above
+operations following the assignment to ``my_local_branch`` should be
+done for each branch.  Ref paths will need to be updated to map the
+local branch to the corresponding upstream branch.  If local branches
+have no corresponding upstream branch, then the creation of
+``local/octopus/<local branch>`` need not use ``git-merge-base`` to
+pinpont its root commit; it may simply be branched from the
+appropriate component branch (say, ``llvm/local_release_X``).
+
+Zipping local history
+---------------------
+
+The octopus merge is suboptimal for many cases, because walking back
+through the history of one component leaves the other components fixed
+at a history that likely makes things unbuildable.
+
+Some downstream users track the order commits were made to subprojects
+with some kind of "umbrella" project that imports the project git
+mirrors as submodules, similar to the multirepo umbrella proposed
+above.  Such an umbrella repository looks something like this::
+
+   UM1 ---- UM2 -- UM3 -- UM4 ---- UM5 ---- UM6 ---- UM7 ---- UM8 <- master
+   |        |             |        |        |        |        |
+  Lllvm1   Llld1         Lclang1  Lclang2  Lllvm2   Llld2     Lmyproj1
+
+The vertical bars represent submodule updates to a particular local
+commit in the project mirror.  ``UM3`` in this case is a commit of
+some local umbrella repository state that is not a submodule update,
+perhaps a ``README`` or project build script update.  Commit ``UM8``
+updates a submodule of local project ``myproj``.
+
+The tool ``zip-downstream-fork.py`` at
+https://github.com/greened/llvm-git-migration/tree/zip can be used to
+convert the umbrella history into a monorepo-based history with
+commits in the order implied by submodule updates::
+
+  U1 - U2 - U3 <- upstream/master
+   \    \    \
+    \    -----\---------------                                    local/zip--.
+     \         \              \                                               |
+    - Lllvm1 - Llld1 - UM3 -  Lclang1 - Lclang2 - Lllvm2 - Llld2 - Lmyproj1 <-'
+
+
+The ``U*`` commits represent upstream commits to the monorepo master
+branch.  Each submodule update in the local ``UM*`` commits brought in
+a subproject tree at some local commit.  The trees in the ``L*1``
+commits represent merges from upstream.  These result in edges from
+the ``U*`` commits to their corresponding rewritten ``L*1`` commits.
+The ``L*2`` commits did not do any merges from upstream.
+
+Note that the merge from ``U2`` to ``Lclang1`` appears redundant, but
+if, say, ``U3`` changed some files in upstream clang, the ``Lclang1``
+commit appearing after the ``Llld1`` commit would actually represent a
+clang tree *earlier* in the upstream clang history.  We want the
+``local/zip`` branch to accurately represent the state of our umbrella
+history and so the edge ``U2 -> Lclang1`` is a visual reminder of what
+clang's tree actually looks like in ``Lclang1``.
+
+Even so, the edge ``U3 -> Llld1`` could be problematic for future
+merges from upstream.  git will think that we've already merged from
+``U3``, and we have, except for the state of the clang tree.  One
+possible migitation strategy is to manually diff clang between ``U2``
+and ``U3`` and apply those updates to ``local/zip``.  Another,
+possibly simpler strategy is to freeze local work on downstream
+branches and merge all submodules from the latest upstream before
+running ``zip-downstream-fork.py``.  If downstream merged each project
+from upstream in lockstep without any intervening local commits, then
+things should be fine without any special action.  We anticipate this
+to be the common case.
+
+The tree for ``Lclang1`` outside of clang will represent the state of
+things at ``U3`` since all of the upstream projects not participating
+in the umbrella history should be in a state respecting the commit
+``U3``.  The trees for llvm and lld should correctly represent commits
+``Lllvm1`` and ``Llld1``, respectively.
+
+Commit ``UM3`` changed files not related to submodules and we need
+somewhere to put them.  It is not safe in general to put them in the
+monorepo root directory because they may conflict with files in the
+monorepo.  Let's assume we want them in a directory ``local`` in the
+monorepo.
+
+**Example 1: Umbrella looks like the monorepo**
+
+For this example, we'll assume that each subproject appears in its own
+top-level directory in the umbrella, just as they do in the monorepo .
+Let's also assume that we want the files in directory ``myproj`` to
+appear in ``local/myproj``.
+
+Given the above run of ``migrate-downstream-fork.py``, a recipe to
+create the zipped history is below::
+
+  # Import any non-LLVM repositories the umbrella references.
+  git -C my-monorepo remote add localrepo \
+                                https://my.local.mirror.org/localrepo.git
+  git fetch localrepo
+
+  subprojects=( clang clang-tools-extra compiler-rt debuginfo-tests libclc
+                libcxx libcxxabi libunwind lld lldb llgo llvm openmp
+                parallel-libs polly pstl )
+
+  # Import histories for upstream split projects (this was probably
+  # already done for the ``migrate-downstream-fork.py`` run).
+  for project in ${subprojects[@]}; do
+    git remote add upstream/split/${project} \
+                   https://github.com/llvm-mirror/${subproject}.git
+    git fetch umbrella/split/${project}
+  done
+
+  # Import histories for downstream split projects (this was probably
+  # already done for the ``migrate-downstream-fork.py`` run).
+  for project in ${subprojects[@]}; do
+    git remote add local/split/${project} \
+                   https://my.local.mirror.org/${subproject}.git
+    git fetch local/split/${project}
+  done
+
+  # Import umbrella history.
+  git -C my-monorepo remote add umbrella \
+                                https://my.local.mirror.org/umbrella.git
+  git fetch umbrella
+
+  # Put myproj in local/myproj
+  echo "myproj local/myproj" > my-monorepo/submodule-map.txt
+
+  # Rewrite history
+  (
+    cd my-monorepo
+    zip-downstream-fork.py \
+      refs/remotes/umbrella \
+      --new-repo-prefix=refs/remotes/upstream/monorepo \
+      --old-repo-prefix=refs/remotes/upstream/split \
+      --revmap-in=monorepo-map.txt \
+      --revmap-out=zip-map.txt \
+      --subdir=local \
+      --submodule-map=submodule-map.txt \
+      --update-tags
+   )
+
+   # Create the zip branch (assuming umbrella master is wanted).
+   git -C my-monorepo branch --no-track local/zip/master refs/remotes/umbrella/master
+
+Note that if the umbrella has submodules to non-LLVM repositories,
+``zip-downstream-fork.py`` needs to know about them to be able to
+rewrite commits.  That is why the first step above is to fetch commits
+from such repositories.
+
+With ``--update-tags`` the tool will migrate annotated tags pointing
+to submodule commits that were inlined into the zipped history.  If
+the umbrella pulled in an upstream commit that happened to have a tag
+pointing to it, that tag will be migrated, which is almost certainly
+not what is wanted.  The tag can always be moved back to its original
+commit after rewriting, or the ``--update-tags`` option may be
+discarded and any local tags would then be migrated manually.
+
+**Example 2: Nested sources layout**
+
+The tool handles nested submodules (e.g. llvm is a submodule in
+umbrella and clang is a submodule in llvm).  The file
+``submodule-map.txt`` is a list of pairs, one per line.  The first
+pair item describes the path to a submodule in the umbrella
+repository.  The second pair item secribes the path where trees for
+that submodule should be written in the zipped history.  
+
+Let's say your umbrella repository is actually the llvm repository and
+it has submodules in the "nested sources" layout (clang in
+tools/clang, etc.).  Let's also say ``projects/myproj`` is a submodule
+pointing to some downstream repository.  The submodule map file should
+look like this (we still want myproj mapped the same way as
+previously)::
+
+  tools/clang clang
+  tools/clang/tools/extra clang-tools-extra
+  projects/compiler-rt compiler-rt
+  projects/debuginfo-tests debuginfo-tests
+  projects/libclc libclc
+  projects/libcxx libcxx
+  projects/libcxxabi libcxxabi
+  projects/libunwind libunwind
+  tools/lld lld
+  tools/lldb lldb
+  projects/openmp openmp
+  tools/polly polly
+  projects/myproj local/myproj
+
+If a submodule path does not appear in the map, the tools assumes it
+should be placed in the same place in the monorepo.  That means if you
+use the "nested sources" layout in your umrella, you *must* provide
+map entries for all of the projects in your umbrella (except llvm).
+Otherwise trees from submodule updates will appear underneath llvm in
+the zippped history.
+
+Because llvm is itself the umbrella, we use --subdir to write its
+content into ``llvm`` in the zippped history::
+
+  # Import any non-LLVM repositories the umbrella references.
+  git -C my-monorepo remote add localrepo \
+                                https://my.local.mirror.org/localrepo.git
+  git fetch localrepo
+
+  subprojects=( clang clang-tools-extra compiler-rt debuginfo-tests libclc
+                libcxx libcxxabi libunwind lld lldb llgo llvm openmp
+                parallel-libs polly pstl )
+
+  # Import histories for upstream split projects (this was probably
+  # already done for the ``migrate-downstream-fork.py`` run).
+  for project in ${subprojects[@]}; do
+    git remote add upstream/split/${project} \
+                   https://github.com/llvm-mirror/${subproject}.git
+    git fetch umbrella/split/${project}
+  done
+
+  # Import histories for downstream split projects (this was probably
+  # already done for the ``migrate-downstream-fork.py`` run).
+  for project in ${subprojects[@]}; do
+    git remote add local/split/${project} \
+                   https://my.local.mirror.org/${subproject}.git
+    git fetch local/split/${project}
+  done
+
+  # Import umbrella history.  We want this under a different refspec
+  # so zip-downstream-fork.py knows what it is.
+  git -C my-monorepo remote add umbrella \
+                                 https://my.local.mirror.org/llvm.git
+  git fetch umbrella
+
+  # Create the submodule map.
+  echo "tools/clang clang" > my-monorepo/submodule-map.txt
+  echo "tools/clang/tools/extra clang-tools-extra" >> my-monorepo/submodule-map.txt
+  echo "projects/compiler-rt compiler-rt" >> my-monorepo/submodule-map.txt
+  echo "projects/debuginfo-tests debuginfo-tests" >> my-monorepo/submodule-map.txt
+  echo "projects/libclc libclc" >> my-monorepo/submodule-map.txt
+  echo "projects/libcxx libcxx" >> my-monorepo/submodule-map.txt
+  echo "projects/libcxxabi libcxxabi" >> my-monorepo/submodule-map.txt
+  echo "projects/libunwind libunwind" >> my-monorepo/submodule-map.txt
+  echo "tools/lld lld" >> my-monorepo/submodule-map.txt
+  echo "tools/lldb lldb" >> my-monorepo/submodule-map.txt
+  echo "projects/openmp openmp" >> my-monorepo/submodule-map.txt
+  echo "tools/polly polly" >> my-monorepo/submodule-map.txt
+  echo "projects/myproj local/myproj" >> my-monorepo/submodule-map.txt
+
+  # Rewrite history
+  (
+    cd my-monorepo
+    zip-downstream-fork.py \
+      refs/remotes/umbrella \
+      --new-repo-prefix=refs/remotes/upstream/monorepo \
+      --old-repo-prefix=refs/remotes/upstream/split \
+      --revmap-in=monorepo-map.txt \
+      --revmap-out=zip-map.txt \
+      --subdir=llvm \
+      --submodule-map=submodule-map.txt \
+      --update-tags
+   )
+
+   # Create the zip branch (assuming umbrella master is wanted).
+   git -C my-monorepo branch --no-track local/zip/master refs/remotes/umbrella/master
+
+
+Comments at the top of ``zip-downstream-fork.py`` describe in more
+detail how the tool works and various implications of its operation.
+
+Importing local repositories
+----------------------------
+
+You may have additional repositories that integrate with the LLVM
+ecosystem, essentially extending it with new tools.  If such
+repositories are tightly coupled with LLVM, it may make sense to
+import them into your local mirror of the monorepo.
+
+If such repositores participated in the umbrella repository used
+during the zipping process above, they will automatically be added to
+the monorepo.  For downstream repositories that don't participate in
+an umbrella setup, the ``import-downstream-repo.py`` tool at
+https://github.com/greened/llvm-git-migration/tree/import can help with
+getting them into the monorepo.  A recipe follows::
+
+  # Import downstream repo history into the monorepo.
+  git -C my-monorepo remote add myrepo https://my.local.mirror.org/myrepo.git
+  git fetch myrepo
+
+  my_local_tags=( refs/tags/release
+                  refs/tags/hotfix )
+
+  (
+    cd my-monorepo
+    import-downstream-repo.py \
+      refs/remotes/myrepo \
+      ${my_local_tags[@]} \
+      --new-repo-prefix=refs/remotes/upstream/monorepo \
+      --subdir=myrepo \
+      --tag-prefix="myrepo-"
+   )
+
+   # Preserve release braches.
+   for ref in $(git -C my-monorepo for-each-ref --format="%(refname)" \
+                  refs/remotes/myrepo/release); do
+     branch=${ref#refs/remotes/myrepo/}
+     git -C my-monorepo branch --no-track myrepo/${branch} ${ref}
+   done
+
+   # Preserve master.
+   git -C my-monorepo branch --no-track myrepo/master refs/remotes/myrepo/master
+
+   # Merge master.
+   git -C my-monorepo checkout local/zip/master  # Or local/octopus/master
+   git -C my-monorepo merge myrepo/master
+
+You may want to merge other corresponding branches, for example
+``myrepo`` release branches if they were in lockstep with LLVM project
+releases.
+
+``--tag-prefix`` tells ``import-downstream-repo.py`` to rename
+annotated tags with the given prefix.  Due to limitations with
+``fast_filter_branch.py``, unannotated tags cannot be renamed
+(``fast_filter_branch.py`` considers them branches, not tags).  Since
+the upstream monorepo had its tags rewritten with an "llvmorg-"
+prefix, name conflicts should not be an issue.  ``--tag-prefix`` can
+be used to more clearly indicate which tags correspond to various
+imported repositories.
+
+Given this repository history::
+
+  R1 - R2 - R3 <- master
+       ^
+       |
+    release/1
+
+The above recipe results in a history like this::
+
+  U1 - U2 - U3 <- upstream/master
+   \    \    \
+    \    -----\---------------                                         local/zip--.
+     \         \              \                                                    |
+    - Lllvm1 - Llld1 - UM3 -  Lclang1 - Lclang2 - Lllvm2 - Llld2 - Lmyproj1 - M1 <-'
+                                                                             /
+                                                                 R1 - R2 - R3  <-.
+                                                                      ^           |
+                                                                      |           |
+                                                               myrepo-release/1   |
+                                                                                  |
+                                                                   myrepo/master--'
+
+Commits ``R1``, ``R2`` and ``R3`` have trees that *only* contain blobs
+from ``myrepo``.  If you require commits from ``myrepo`` to be
+interleaved with commits on local project branches (for example,
+interleaved with ``llvm1``, ``llvm2``, etc. above) and myrepo doesn't
+appear in an umbrella repository, a new tool will need to be
+developed.  Creating such a tool would involve:
+
+1. Modifying ``fast_filter_branch.py`` to optionally take a
+   revlist directly rather than generating it itself
+
+2. Creating a tool to generate an interleaved ordering of local
+   commits based on some criteria (``zip-downstream-fork.py`` uses the
+   umbrella history as its criterion)
+
+3. Generating such an ordering and feeding it to
+   ``fast_filter_branch.py`` as a revlist
+
+Some care will also likely need to be taken to handle merge commits,
+to ensure the parents of such commits migrate correctly.
+
+Scrubbing the Local Monorepo
+----------------------------
+
+Once all of the migrating, zipping and importing is done, it's time to
+clean up.  The python tools use ``git-fast-import`` which leaves a lot
+of cruft around and we want to shrink our new monorepo mirror as much
+as possible.  Here is one way to do it::
+
+  git -C my-monorepo checkout master
+
+  # Delete branches we no longer need.  Do this for any other branches
+  # you merged above.
+  git -C my-monorepo branch -D local/zip/master || true
+  git -C my-monorepo branch -D local/octopus/master || true
+
+  # Remove remotes.
+  git -C my-monorepo remote remove upstream/monorepo
+
+  for p in ${my_projects[@]}; do
+    git -C my-monorepo remote remove upstream/split/${p}
+    git -C my-monorepo remote remove local/split/${p}
+  done
+
+  git -C my-monorepo remote remove localrepo
+  git -C my-monorepo remote remove umbrella
+  git -C my-monorepo remote remove myrepo
+
+  # Add anything else here you don't need.  refs/tags/release is
+  # listed below assuming tags have been rewritten with a local prefix.
+  # If not, remove it from this list.
+  refs_to_clean=(
+    refs/original
+    refs/remotes
+    refs/tags/backups
+    refs/tags/release
+  )
+
+  git -C my-monorepo for-each-ref --format="%(refname)" ${refs_to_clean[@]} |
+    xargs -n1 --no-run-if-empty git -C my-monorepo update-ref -d
+
+  git -C my-monorepo reflog expire --all --expire=now
+
+  # fast_filter_branch.py might have gc running in the background.
+  while ! git -C my-monorepo \
+    -c gc.reflogExpire=0 \
+    -c gc.reflogExpireUnreachable=0 \
+    -c gc.rerereresolved=0 \
+    -c gc.rerereunresolved=0 \
+    -c gc.pruneExpire=now \
+    gc --prune=now; do
+    continue
+  done
+
+  # Takes a LOOOONG time!
+  git -C my-monorepo repack -A -d -f --depth=250 --window=250
+
+  git -C my-monorepo prune-packed
+  git -C my-monorepo prune
+
+You should now have a trim monorepo.  Upload it to your git server and
+happy hacking!
 
 References
 ==========




More information about the llvm-commits mailing list