[PATCH] D24167: Moving to GitHub - Unified Proposal

Fri Sep 30 15:51:32 PDT 2016

mehdi_amini added inline comments.

> beanz wrote in GitHubMove.rst:249
> Remove "(with some granularity)". The multi-repo proposal can have the same 1:1 mapping of commits in per-project repos to umbrella commits that the mono-repo would have.
> 
> When the update job runs with a list of more than one commit we can sort them by committer timestamp (which is updated after rebase). It will provide a roughly linear timeline for the commits to be sorted across the repositories. It won't be perfect, but it should be good enough for sorting commits in close proximity because the pushed commits will either be rebased (which updates the committer timestamp) or they will be merge commits which will have a committer timestamp generated when the merge commit was generated.

It seems to me that at the beginning the idea was that the submodules would be updated every few minutes, *so that* we'd be able to have rev-locked commits pushed to multiple projects at the same time and have them appear a single umbrella update (with somehow a heuristic like "update the submodules when there hasn't been a push for 2 min").

Apparently your idea is rather than we should update it with single commits, but what's the story for rev-locked?
How would the tooling not have a race condition? Example:

1. I commit to LLVM
2. I commit to Clang
3. the script runs, pull LLVM, no change
4. I push to LLVM
5. I push to Clang
6. the script pulls Clang, see my commit
7. the script is done with pulling and update the submodule with the clang change, *before* the LLVM change, even though the commit date would be reversed.

I don't see a principled solution to implement the umbrella without server-side (i.e. native git hook) support. Sure you can //craft// it, and it'll work fine most of the time, but that does not make it bulletproof.

> beanz wrote in GitHubMove.rst:570
> This is inaccurate. Even though my rough prototype of the git umbrella repo doesn't have each submodule update being a single commit that was the stated plan for how the umbrella would be updated. That means each umbrella repo commit would represent a single commit to a single subproject, so your bisection granularity is comparable.

(I'm waiting for the story to support this above)

> beanz wrote in GitHubMove.rst:207
> How does the mono-repo do this? It might make it easier, but since it is likely that even with a mono-repo most people won't build all projects I don't think it actually encourages updates across all sub-projects.

I was thinking about the fact that if I change the API `createTargetMachineFromTriple()`, and `git grep` to find the uses, then all the uses in sub-projects will show up.

> beanz wrote in GitHubMove.rst:214
> You still haven't addressed the feedback here. Saying the multi-repo would lose history is still inaccurate.
> 
> For starters, you're not actually deleting the history from the repository you're moving code from. Also with a multi-repo you can easily preserve the file history by using git filter-branch. Using filter-branch will not follow history across renames that are outside the filter, but will follow them within the filter.
> 
> For example if you were to use filter branch on lib/Support to break it out into its own repository, filter branch would preserve history of files under lib/Support that are renamed as long as they remain under libSupport. It would not preserve the history of a file being renamed and moved under libSupport. Even with that the history before that point *is* traceable because the history would still exist in the old repository, so you are not losing history, you just aren't moving it with the file.

Fair enough: replaced "losing history" with "the history of the refactored code won't be available from the new place".

> beanz wrote in GitHubMove.rst:222
> What about the concerns about active community members having this burden?

Can you clarify what you're referring to exactly? (No regression compared to now I believe)

> beanz wrote in GitHubMove.rst:334
> You've lost me here. Checking out all the projects in SVN today involves multiple svn co commands. Unless there is some magic in SVN I'm unaware of. If there is such magic we should document it somewhere on LLVM.org (maybe on the getting started page?) and link to it here.

I was referring to:

  svn co http://llvm.org/svn/llvm-project/ --depth=immediates
  cd llvm-project/
  svn up llvm/trunk clang/trunk libcxx/trunk

You can then have a build with only LLVM configured like:

  mkdir ../build-llvm && cd ../build-llvm
  cmake ../llvm-project/llvm/trunk

And a build dir with llvm+clang:

  mkdir ../build-clang && cd ../build-clang
  cmake ../llvm-project/llvm/trunk -DLLVM_EXTERNAL_CLANG_DIR=../llvm-project/clang/trunk/

So that a single `svn up $projects` in the source directory update all the sources and you can still build a subset of the projects from these sources.

This is also how I'd synchronize if I was integrating downstream from SVN.

> beanz wrote in GitHubMove.rst:445
> Alternatively since our intention is to enforce a linear history in the repositories doing a checkout by timestamp using the format below should also work in the majority of cases.
> 
>   git checkout 'master@{...}'

This applies to both proposals right? Where do you want me to add this?

> beanz wrote in GitHubMove.rst:461
> Again, I don't follow how this is easy. There is no documentation on LLVM.org explaining how to do this and my limited knowledge of SVN leaves me with no idea how to do it.

(Copy/pasted commands above)

> beanz wrote in GitHubMove.rst:461
> Again, I don't follow how this is easy. There is no documentation on LLVM.org explaining how to do this and my limited knowledge of SVN leaves me with no idea how to do it.

Copy/pasted above (I'm not sure I really want to document it on llvm.org now).

> beanz wrote in GitHubMove.rst:471
> Please remove "and makes this use case ...", it is a value judgement.

I don't believe so, but if you insist...

> beanz wrote in GitHubMove.rst:584
> If we go with the multi-repo approach we can ensure that each umbrella repo commit will be only one submodule update. This is relatively straight forward tooling to add. The only situation where we could potentially allow multiple updates in a single umbrella commit would be if we wanted to do cross-repository correlating of revlocked changes.

(I'm waiting for the story to support this above)

> beanz wrote in GitHubMove.rst:589
> The granularity is not finer.

(I'm waiting for the story to support this above)

> beanz wrote in GitHubMove.rst:601
> Better to say both proposals allow you to continue using SVN the same way, but that each solution will have minor impacts. In the monorepo there will be a one-time change in revision numbers, and in the multi-repo each project will have its own revision numbers out of sync from each other.

"The same way" implies "a single SVN revision number to me". One could even say "a single SVN checkout" (cf the command I copy/pasted above).
I don't see how it'd work with the multi-repo? 
How would someone downstream integrating from SVN be able to correlate revision across repositories?

> beanz wrote in GitHubMove.rst:622
> I would phrase the downside as "rewriting the fork's history and changing its commit hashes", because that is what happens.

The paragraph *starts* with " Using a script that rewrites history" and end with "changes the fork's commit hashes", it seems to me that this makes explicit that the downside of rewriting history is that the hashes change.
(I'm not sure how "rewriting history" is a downside by itself otherwise)

https://reviews.llvm.org/D24167