[PATCH] D22463: [RFC] Moving to GitHub Proposal: NOT DECISION!

Tue Jul 19 17:37:52 PDT 2016

jlebar added a subscriber: jlebar.
jlebar added a comment.

I'm sure you all have thought about this more than I have, and I apologize if this has been brought up before because I haven't been following the thread closely.  But I am not convinced by this document that using subrepositories beats using a single git repo.

I see two reasons here for using subrepos as opposed to one big repository.

1. Subrepos mirror our current scheme.
2. Subrepos let people check out only the bits of llvm that they want.

I don't find either of these particularly compelling, compared to the advantages of one-big-repo (discussed below).  Taking them in turn:

1. Although subrepos would mirror our current scheme, it's going to be different *enough* that existing tools are going to have to change either way.  In particular, the svn view of the master repository is not going to be useful for anything.  I tried `svn checkout https://github.com/chapuni/llvm-project-submodule`, and the result was essentially an empty repository.

2. It's true that subrepos let people check out only the bits that they want.  But disk space and bandwidth are very cheap today, and LLVM is not as large as one might think.  My copy of https://github.com/llvm-project/llvm-project, which includes *everything* is 2.5G, whereas my copy of just llvm is 626M.

  Given that a release build of llvm and clang is ~3.5G, a 2.5G source checkout doesn't seem at all unreasonable to me.

  If it's really problematic, you can do a shallow checkout, which would take the contains-everything repo from 2.5G to 1.3G.  Moreover if it's *really* a problem, you can mirror the subdir of llvm that you care about.  Maybe the LLVM project could maintain said mirrors for some of the small subrepos that are often used independently.

So what's the advantage of using one big repository?  The simple answer is: Have you ever *tried* using git submodules?  :)

Submodules make everything more complicated.  Here's an example that I hope proves the point.  Suppose you want to commit your current work and switch to a new clean branch off head.  You make some changes there, then come back to your current work.  And let's assume that all of your changes are to clang only.

  # Commit current work, switch to a clean branch off head, then switch back.

  # One big repo: 
  $ git commit  # on old-branch
  $ git fetch
  $ git checkout -b new-branch origin/master
  # Hack hack hack...
  $ git commit
  $ git checkout old-branch

  # Submodules, attempt 1:
  $ cd clang
  $ git commit  # on old-branch
  $ git fetch
  $ git checkout -b new-branch origin/master
  # Also have to update llvm...
  $ cd ../llvm
  $ git fetch
  $ git checkout origin/master
  $ cd ../clang
  # Hack hack hack
  $ git commit

  # Now we're ready to switch back to old-branch, but...it's not going to work.
  # When we committed our old branch, we didn't save the state of our llvm
  # checkout.  So in particular we don't know which revision to roll it back to.

  # Let's try again.
  # Submodules, attempt 2:
  $ cd clang
  $ git commit  # on old-branch
  $ cd ..
  $ git checkout -b old-branch # in master repo
  $ git commit

  # Now we have two branches called "old-branch": One in the master repo, and one
  # in the clang submodules.  Now let's fetch head.

  $ git fetch  # in master repo
  $ git checkout -b new-branch origin/master
  $ git submodule update
  $ cd clang
  $ git checkout -b new-branch
  # Hack hack hack
  $ git commit  # in submodule
  $ cd ..
  $ git commit  # in master repo

  # Now we're ready to switch back.

  $ git checkout old-branch  # in master repo
  $ git submodule update

For those keeping track at home, this is 5 git commands with the big repo, and 15 commands (11 git commands) in the submodules world.

Above we assumed that all of our changes were only to clang.  If we're making changes to both llvm and clang (say), the one-big-repo workflow remains identical, but the submodules workflow becomes even more complicated.

I'm sure people who are better at git than I can golf the above commands, but I'll suggest that I'm an above-average git user, so this is probably a lower-than-average estimate for the number of git commands (particularly `git help` :).  git is hard enough as-is; using submodules like this is asking a lot.

Similarly, I'm sure much of this can be scripted, but...seriously?  :)

Sorry for the wall of text.  tl;dr: One big repo doesn't actually cost that much, and that cost is dwarfed by the cost to humans of using submodules as proposed.

https://reviews.llvm.org/D22463