[PATCH] D24167: Moving to GitHub - Unified Proposal

Fri Sep 30 16:57:44 PDT 2016

beanz added inline comments.

> mehdi_amini wrote in GitHubMove.rst:249
> It seems to me that at the beginning the idea was that the submodules would be updated every few minutes, *so that* we'd be able to have rev-locked commits pushed to multiple projects at the same time and have them appear a single umbrella update (with somehow a heuristic like "update the submodules when there hasn't been a push for 2 min").
> 
> Apparently your idea is rather than we should update it with single commits, but what's the story for rev-locked?
> How would the tooling not have a race condition? Example:
> 
> 1. I commit to LLVM
> 2. I commit to Clang
> 3. the script runs, pull LLVM, no change
> 4. I push to LLVM
> 5. I push to Clang
> 6. the script pulls Clang, see my commit
> 7. the script is done with pulling and update the submodule with the clang change, *before* the LLVM change, even though the commit date would be reversed.
> 
> I don't see a principled solution to implement the umbrella without server-side (i.e. native git hook) support. Sure you can //craft// it, and it'll work fine most of the time, but that does not make it bulletproof.

The automation will run. It will collect a list of commits that have been pushed to each repository since the last time the script ran. It will then sort them by committer timestamp order, and commit one at a time to the umbrella repo as submodule updates.

We can setup the automation to run based on GitHub WebHooks, and periodically in case a WebHook gets dropped.

There is no race condition that I see.

If we need to support revlocked changes, (and I'm not convinced this is the case since they are by far a minority of commits) we can support them via annotations on the commit messages. We can teach the automation to look for markers in the commit message denoting that it is revlocked to other changes, and we can have it group revlocked changes together.

There is no need for server-side hooks, and this solution would work as well as any mirroring system. I don't believe there is any need for this solution to be bulletproof, but I see no reason why it cannot be as robust as the single-project mirrors that the mono-repo proposal includes.

> mehdi_amini wrote in GitHubMove.rst:570
> (I'm waiting for the story to support this above)

See above.

> mehdi_amini wrote in GitHubMove.rst:207
> I was thinking about the fact that if I change the API `createTargetMachineFromTriple()`, and `git grep` to find the uses, then all the uses in sub-projects will show up.

That is 'making easier' not 'encouraging'. Personally I fall to 'grep' way before I fall to 'git grep' for things like this, and I don't think the monorepo has any enforcement of this.

> mehdi_amini wrote in GitHubMove.rst:214
> Fair enough: replaced "losing history" with "the history of the refactored code won't be available from the new place".

In your example of moving clang-tools-extra there would be *no* need for loss of history at all. There is no need for filter-branch. You can literally reformat clang-tools-extra to be under tools/extra/ and merge the whole tree into the clang master branch.

The only point where you would lose any history at all is if you were trimming one part of a repository into another repository, and even in that situation you can minimize the losses pretty well using filter-branch and index scripts. It is complicated but possible.

> mehdi_amini wrote in GitHubMove.rst:222
> Can you clarify what you're referring to exactly? (No regression compared to now I believe)

Ah. I misread. I see what you are saying. This is fine.

> mehdi_amini wrote in GitHubMove.rst:334
> I was referring to:
> 
>   svn co http://llvm.org/svn/llvm-project/ --depth=immediates
>   cd llvm-project/
>   svn up llvm/trunk clang/trunk libcxx/trunk
> 
> You can then have a build with only LLVM configured like:
> 
>   mkdir ../build-llvm && cd ../build-llvm
>   cmake ../llvm-project/llvm/trunk
> 
> And a build dir with llvm+clang:
> 
>   mkdir ../build-clang && cd ../build-clang
>   cmake ../llvm-project/llvm/trunk -DLLVM_EXTERNAL_CLANG_DIR=../llvm-project/clang/trunk/
> 
> So that a single `svn up $projects` in the source directory update all the sources and you can still build a subset of the projects from these sources.
> 
> This is also how I'd synchronize if I was integrating downstream from SVN.

I can't imagine that is a common workflow. It certainly isn't the documented recommended workflow on llvm.org, so I'm not sure there is value in bringing it into the discussion.

> beanz wrote in GitHubMove.rst:350
> Can you please add actual size numbers for each project and the mono-repo?
> 
> Just saying '2x' isn't super meaningful without knowing the size of 1x.

Can you add per-project sizes?

> mehdi_amini wrote in GitHubMove.rst:445
> This applies to both proposals right? Where do you want me to add this?

I think it is worth noting under the multi-repo proposal something along the lines of:

> Because we will be maintaining a linear history you can perform a timestamp based checkout of each project repository with the following command:
> 
>   git checkout 'master@{...}'
> 
> 
> Additionally you can use the umbrella repository...

If you want to also add the timestamp checkout to the mono-repo proposal, that makes sense too. I just think it is worth noting under the multi-repo proposal that timestamp based checkouts are expected to work due to the linear history requirement, which means you don't need the submodule repo.

> mehdi_amini wrote in GitHubMove.rst:461
> Copy/pasted above (I'm not sure I really want to document it on llvm.org now).

Fine if you don't want to document it, but I certainly would not describe that as "easy". Especially because if you ever mix up and type "svn up" in the root it starts updating *everything*. I think this is an incredibly fragile workflow, which is probably why it is also incredibly uncommon.

> mehdi_amini wrote in GitHubMove.rst:584
> (I'm waiting for the story to support this above)

Again, above.

> mehdi_amini wrote in GitHubMove.rst:601
> "The same way" implies "a single SVN revision number to me". One could even say "a single SVN checkout" (cf the command I copy/pasted above).
> I don't see how it'd work with the multi-repo? 
> How would someone downstream integrating from SVN be able to correlate revision across repositories?

Maybe rather than "the same way" "with similar workflows to today"?

> mehdi_amini wrote in GitHubMove.rst:622
> The paragraph *starts* with " Using a script that rewrites history" and end with "changes the fork's commit hashes", it seems to me that this makes explicit that the downside of rewriting history is that the hashes change.
> (I'm not sure how "rewriting history" is a downside by itself otherwise)

Fine.

https://reviews.llvm.org/D24167