[PATCH] D24167: Moving to GitHub - Unified Proposal

Chris Bieneman via llvm-commits llvm-commits at lists.llvm.org
Fri Sep 30 14:30:22 PDT 2016


beanz added a comment.

Lots of comments inline, and one meta-comment.

Looking at the details of the mono-repo proposal "use the GitHub SVN" interface is the answer to a lot of workflows. How would the Git-SVN workflows be impacted by moving to a PR-based workflow? I assume it works fine, you just create SVN branches and commit to them then make the PR via the web UI. Is that correct?

I know we're not actually considering a PR-based workflow, but it is something to consider.



> rengolin wrote in GitHubMove.rst:249
> Can you add a link of the monorepo as well? I think you had one, right?

Remove "(with some granularity)". The multi-repo proposal can have the same 1:1 mapping of commits in per-project repos to umbrella commits that the mono-repo would have.

When the update job runs with a list of more than one commit we can sort them by committer timestamp (which is updated after rebase). It will provide a roughly linear timeline for the commits to be sorted across the repositories. It won't be perfect, but it should be good enough for sorting commits in close proximity because the pushed commits will either be rebased (which updates the committer timestamp) or they will be merge commits which will have a committer timestamp generated when the merge commit was generated.

> GitHubMove.rst:207
> +Finally, it also encourages to update all the sub-projects when
> +changing API or refactoring code.
> +

How does the mono-repo do this? It might make it easier, but since it is likely that even with a mono-repo most people won't build all projects I don't think it actually encourages updates across all sub-projects.

> GitHubMove.rst:214
> +more complicated than a simple `git mv` command, and we would end up losing
> +history.
> +

You still haven't addressed the feedback here. Saying the multi-repo would lose history is still inaccurate.

For starters, you're not actually deleting the history from the repository you're moving code from. Also with a multi-repo you can easily preserve the file history by using git filter-branch. Using filter-branch will not follow history across renames that are outside the filter, but will follow them within the filter.

For example if you were to use filter branch on lib/Support to break it out into its own repository, filter branch would preserve history of files under lib/Support that are renamed as long as they remain under libSupport. It would not preserve the history of a file being renamed and moved under libSupport. Even with that the history before that point *is* traceable because the history would still exist in the old repository, so you are not losing history, you just aren't moving it with the file.

> GitHubMove.rst:222
> +for more details).
> +
> +Finally, nobody will be forced to compile projects they don't want to build.

What about the concerns about active community members having this burden?

> GitHubMove.rst:253
> +
> +This umbrella repository will be read-only and periodically updated
> +to record the above tuple. The proposed form to record this is to use Git

I would say 'continuously' rather than periodically here. You describe in more detail below how the notifications would be configured and 'periodically' isn't a full picture.

> GitHubMove.rst:268
> +
> +Later sections illustrates how end-users interacts with this repository for
> +various use-cases, and the `Previews`_ section links to a usable repository

s/interacts/would interact/

> GitHubMove.rst:334
> +compiler-rt for instance. In this way it's not different from someone who would
> +check out all the projects with SVN today.
> +

You've lost me here. Checking out all the projects in SVN today involves multiple svn co commands. Unless there is some magic in SVN I'm unaware of. If there is such magic we should document it somewhere on LLVM.org (maybe on the getting started page?) and link to it here.

> GitHubMove.rst:350
> +checkout, you only see `compiler-rt`.  Git compresses its history, so
> +a clone of everything is only about 2x as much data as a clone of llvm only (and
> +in any case this is dwarfed by the size of e.g. a llvm build dir).

Can you please add actual size numbers for each project and the mono-repo?

Just saying '2x' isn't super meaningful without knowing the size of 1x.

> GitHubMove.rst:387
> +In this case the repository contains only a single sub-project, and commits can
> +be made using `git svn dcommit`, again **exactly as we do today**.
> +

the emphasis on 'exactly as we do today' is unnecessary.

> GitHubMove.rst:445
> +`llvm/tools/clang`, etc.
> +
> +**Monorepo Proposal**

Alternatively since our intention is to enforce a linear history in the repositories doing a checkout by timestamp using the format below should also work in the majority of cases.

  git checkout 'master@{...}'

> GitHubMove.rst:461
> +
> +Today this is easy for subversion users, and possible but not straighfoward for
> +git-svn users. Few Git users try to e.g. update LLD or Clang in the same commit

Again, I don't follow how this is easy. There is no documentation on LLVM.org explaining how to do this and my limited knowledge of SVN leaves me with no idea how to do it.

> GitHubMove.rst:466
> +The multirepo proposal does not address this: one would have to commit and push
> +separately in every individual repository. It might be possible to establish a
> +protocol whereby users add a special token to their commit messages that causes

I would phrase as "It would be possible...", because it most certainly is possible.

> GitHubMove.rst:471
> +The monorepo proposal handles this natively and makes this use case
> +trivial.
> +

Please remove "and makes this use case ...", it is a value judgement.

> GitHubMove.rst:584
> +possible that one commit in the umbrella repository includes multiple commits
> +in the sub-projects though it should be occasional in practice).
> +

If we go with the multi-repo approach we can ensure that each umbrella repo commit will be only one submodule update. This is relatively straight forward tooling to add. The only situation where we could potentially allow multiple updates in a single umbrella commit would be if we wanted to do cross-repository correlating of revlocked changes.

> GitHubMove.rst:589
> +Bisecting on the monorepo is straightforward and almost identical to the
> +multirepo situation explained above. The granularity is finer since each
> +individual commits in every sub-projects participate in the bisection. The

The granularity is not finer.

> GitHubMove.rst:601
> +* If you were pulling from the SVN repo before the switch to Git. The monorepo
> +  will allow you to continue to use SVN the same way. The main caveat is that
> +  you'll need to be prepared for a one-time change to the revision numbers.

Better to say both proposals allow you to continue using SVN the same way, but that each solution will have minor impacts. In the monorepo there will be a one-time change in revision numbers, and in the multi-repo each project will have its own revision numbers out of sync from each other.

> GitHubMove.rst:607
> +* If you were pulling from one of the existing read-only Git repos, this also
> +  will continue to work as before as they will continue to exist in any of the
> +  proposal.

s/any of the proposal/both of the proposals/

> GitHubMove.rst:611
> +Under the monorepo proposal, you have a third option: migrating your fork to
> +the monorepo.  This can be particularly beneficial if your fork touches
> +multiple sub-projects (e.g. llvm and clang), because now you can commingle

Reword from the second sentence on. You're making a value assessment. A better phrasing might be:

> If your fork touches multiple LLVM projects, migrating your fork into the mono repo would enable you to make commits that touch multiple projects at the same time the same way LLVM contributors would be able to do so.

> GitHubMove.rst:622
> +  out an old revision you'll get a consistent version of llvm proper.)  The
> +  downside is that this changes the fork's commit hashes.
> +

I would phrase the downside as "rewriting the fork's history and changing its commit hashes", because that is what happens.

> GitHubMove.rst:630
> +the monorepo is always possible (the splitrepo is obvious): you can apply the
> +patches in the appropriate subdirectory of the monorepo.
> +

This is a little unclear to me. Do you mean applying the patches via "git apply" from a patch file? Might be worth clarification about how that would work.

> GitHubMove.rst:641
> +Note that even within the monorepo, developers who hack only on one of these
> +sub-projects can continue to use the single sub-project Git mirrors, so their
> +workflow is unchanged.  (That is, they aren't forced to download or check out

This makes it sound like the git mirrors are read-write. Might be worth adding a "via Git-SVN" comment to clarify.

https://reviews.llvm.org/D24167





More information about the llvm-commits mailing list