[llvm-dev] [RFC] One or many git repositories?

Wed Jul 20 18:28:08 PDT 2016

> On Jul 20, 2016, at 5:46 PM, Chandler Carruth via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> On Wed, Jul 20, 2016 at 5:36 PM Justin Bogner <mail at justinbogner.com <mailto:mail at justinbogner.com>> wrote:
> Chandler Carruth <chandlerc at google.com <mailto:chandlerc at google.com>> writes:
> > On Wed, Jul 20, 2016 at 5:02 PM Justin Bogner via llvm-dev <
> > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> >
> >> Justin Lebar via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> writes:
> >> > I would like to (re-)open a discussion on the following specific
> >> question:
> >> >
> >> >   Assuming we are moving the llvm project to git, should we
> >> >   a) use multiple git repositories, linked together as subrepositories
> >> > of an umbrella repo, or
> >> >   b) use a single git repository for most llvm subprojects.
> >> >
> >> > The current proposal assembled by Renato follows option (a), but I
> >> > think option (b) will be significantly simpler and more effective.
> >> > Moreover, I think the issues raised with option (b) are either
> >> > incorrect or can be reasonably addressed.
> >> >
> >> > Specifically, my proposal is that all LLVM subprojects that are
> >> > "version-locked" (and/or use the common CMake build system) live in a
> >> > single git repository.  That probably means all of the main llvm
> >> > subprojects other than the test-suite and maybe libc++.  From looking
> >> > at the repository today that would be: llvm, clang, clang-tools-extra,
> >> > lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs.
> >>
> >> FWIW, I'm opposed. I'm not convinced that the problems with multiple
> >> repos are any worse than the problems with a single repo, which makes
> >> this more or less just change for the sake of change, IMO.
> >>
> >
> > It would be useful to know what problems you see with a single repo that
> > are more significant. In particular, either why you think the problems
> > jlebar already mentioned are worse than he sees them, or what other
> > problems are that he hasn't addressed.
> 
> Running the same 'git checkout' commands on multiple repos has always
> been sufficient to manage the multiple repos so far - as long as you
> create the same branches and tags in each repo, it's easy[1] to manage
> the set of repos with a script that cd's to each one and runs whatever
> git command.
> 
> A notable difference is the ability to do API updates across them or the ability to bisect across them.
> 
> Also, if the infrastructure that keeps the umbrella repo in sync falls over or has a serious problem, reconstructing version-locked state in order to bisect across those regions of time seems quite challenging. So IMO, it isn't a minor inconvenience, even if it is something we could overcome.
>  
> So it's a pretty minor inconvenience today to have the multiple repos in
> the case where you want to check out all of them.
> 
> OTOH, if all of the repos are combined into one, you have to do work
> when you only want some of them. In my experience, this is basically
> always - between my various machines and projects I have a several
> checkouts of llvm+compiler-rt+clang+libc++, and I have a lot of
> checkouts of just llvm. I've only checked out the other repos when I was
> changing APIs and needed to update them.
> 
> I haven't tried the options jlebar has described to deal with these -
> sparse checkouts and whatnot, but they seem like an equivalent amount of
> work/learning curve as writing a script that cd's to several directories
> and runs the same git command in each.
> 
> I actually would like to see an example of how you would checkout a common subset with the sparse checkout feature. jlebar, could you give us demo commands for this?

Since there is already a unified repo for testing here: https://github.com/llvm-project/llvm-project <https://github.com/llvm-project/llvm-project>

Here is what it would look like for someone interested in checking out in LLVM and Clang only:

# Prepare the git repo
mkdir llvm
cd llvm
git init
git remote add origin git at github.com:llvm-project/llvm-project.git

# Setup the sparse checkout, asking for clang and llvm only
git config core.sparseCheckout true
mkdir .git/info
echo /llvm >> .git/info/sparse-checkout 
echo /clang >> .git/info/sparse-checkout 

# Actually fetch the data and checkout just clang and llvm.
git pull origin master

# At this point the checkout contains the directories for clang and llvm only.

Obviously this will download the 2.5GB repository (all branches for all projects), but that should happen *once* on a developer machine (future clone can be using `git worktree`).
For bots, shallow clone are efficient, with some modification to the script above:

# Prepare the git repo
mkdir llvm
cd llvm
git init
git remote add origin git at github.com:llvm-project/llvm-project.git -t master

# Setup the sparse checkout, asking for clang and llvm only
git config core.sparseCheckout true
mkdir .git/info
echo /llvm >> .git/info/sparse-checkout 
echo /clang >> .git/info/sparse-checkout 

# Actually fetch the data and checkout just clang and llvm.
git pull origin master --depth=1
# alternatively: git fetch —depth=1 && git reset —hard origin/master

(That’s 81.58MB download, independently of the number of sub-projects to actually checkout)

— 
Mehdi

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160720/f00c8970/attachment.html>