[llvm-dev] New LLVM git repository conversion prototype
Duncan P. N. Exon Smith via llvm-dev
llvm-dev at lists.llvm.org
Thu Oct 18 00:22:43 PDT 2018
I did a sanity check of tree hashes and found something suspicious.
Background: I cloned the prototype as remote "github/llvm-git-prototype" and then added the existing Git mirror as "llvm.org/llvm":
```
$ git clone -o github/llvm-git-prototype https://github.com/llvm-git-prototype/llvm.git
Cloning into 'llvm'...
remote: Enumerating objects: 122, done.
remote: Counting objects: 100% (122/122), done.
remote: Compressing objects: 100% (94/94), done.
remote: Total 3243700 (delta 46), reused 53 (delta 28), pack-reused 3243578
Receiving objects: 100% (3243700/3243700), 529.31 MiB | 15.13 MiB/s, done.
Resolving deltas: 100% (2653514/2653514), done.
Checking out files: 100% (78392/78392), done.
$ cd llvm
$ du -hs .git/objects
616M .git/objects
$ git remote add llvm.org/llvm https://git.llvm.org/git/llvm.git
$ git fetch llvm.org/llvm master
warning: no common commits
remote: Counting objects: 1580199, done.
remote: Compressing objects: 100% (269578/269578), done.
remote: Total 1580199 (delta 1315195), reused 1569271 (delta 1305156)
Receiving objects: 100% (1580199/1580199), 302.18 MiB | 30.73 MiB/s, done.
Resolving deltas: 100% (1315195/1315195), done.
>From https://git.llvm.org/git/llvm
* branch master -> FETCH_HEAD
* [new branch] master -> llvm.org/llvm/master
$ du -hs .git/objects
960M .git/objects
$ git rev-list --count llvm.org/llvm/master
170696
```
(Side note: growing .git/objects from 616M to 960M seemed unexpectedly high to me given that ~300M should only have included commit objects (since the tree objects should be shared). Fortunately, repacking gives better results:
```
$ git repack -ad
Counting objects: 3503266, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (699515/699515), done.
Writing objects: 100% (3503266/3503266), done.
Total 3503266 (delta 2744501), reused 3501658 (delta 2743526)
$ du -hs .git/objects
678M .git/objects
```
Vendors that want to merge a downstream vendor branch with the new monorepo can just repack after the initial merges.)
I did a couple of tree object spot checks. At ToT the tree objects match, both giving the tree 8cf37e491e61:
```
$ git log github/llvm-git-prototype/master --oneline -1
182432b9a160 (HEAD -> master, github/llvm-git-prototype/master, github/llvm-git-prototype/HEAD) Add a emitUnaryFloatFnCall version that fetches the function name from TLI
$ git rev-parse github/llvm-git-prototype/master:llvm
8cf37e491e6182a35e3b2755a25ee21454596ce2
$ git log llvm.org/llvm/master --oneline -1
577c9cec20a0 (llvm.org/llvm/master) Add a emitUnaryFloatFnCall version that fetches the function name from TLI
$ git rev-parse llvm.org/llvm/master:
8cf37e491e6182a35e3b2755a25ee21454596ce2
```
But then I looked at r3210, and the tree objects don't match:
```
$ git log -1 --oneline github/llvm-git-prototype/master --grep llvm-svn=3210'$' --stat
da6a562cdd45 Split dominance calculation and post dominance calculation stuff Dominance calculation goes to VMCore library to be used by Verifier.
llvm/lib/Analysis/PostDominators.cpp | 273 ++--------------------------------------------------------------------------
llvm/lib/VMCore/Dominators.cpp | 172 ++----------------------------------------------
2 files changed, 11 insertions(+), 434 deletions(-)
$ git rev-parse da6a562cdd45:llvm
33ba626067f351462aa3aab7c0b2bf62c7d664bd
$ git log -1 --oneline llvm.org/llvm/master --grep @3210' ' --stat
4c9df7c619ba Split dominance calculation and post dominance calculation stuff Dominance calculation goes to VMCore library to be used by Verifier.
lib/Analysis/PostDominators.cpp | 273 ++-------------------------------------------------------------------------------
lib/VMCore/Dominators.cpp | 172 ++-------------------------------------------------
2 files changed, 11 insertions(+), 434 deletions(-)
$ git rev-parse 4c9df7c619ba:
4b2b713c17e2cf2c43e94379023483f13013d237
```
Looking deeper:
```
$ git ls-tree da6a562cdd45:llvm
100644 blob 6698a545eb7782a24e5031dcf09d8f148eb5f7e6 Makefile
100644 blob 74c865a67a968ded45873d97b7209052602cf8b5 Makefile.common
100644 blob 74c865a67a968ded45873d97b7209052602cf8b5 Makefile.rules
100755 blob fca274c810ccf6c8a234d67b3c4eb8cb8f5c08dd cvsupdate
040000 tree c9137d50217f7def5830ab8e1b405d6d4efbd8e9 docs
100755 blob 25673559436c0756692cc032750437c6c18f6d1e getsomesrcs.sh
100755 blob ad755ceee38d1604978f7102b6c095824db17931 getsrcs.sh
040000 tree 2a69dd853c39d673fe2708d9cafef5bd6565e252 include
040000 tree 6ea72c57a2cef22aeaeafdee33e548905fe7e331 lib
040000 tree 9d535ab8b99621402d80678551c1f75a0ca8dc75 runtime
040000 tree baef94e0fd85c3ddd89f3ce3e6f043ea5fe7a611 support
040000 tree f07fb1486d95ec3118d98106781410881af5f9fd test
040000 tree e311bdec813c957dc6d8866d07a65e349e0f42c7 tools
040000 tree 66871d3271babc519cd345f4bb6f2af7f25b3473 utils
$ git ls-tree 4c9df7c619ba:
100644 blob 6698a545eb7782a24e5031dcf09d8f148eb5f7e6 Makefile
100644 blob 74c865a67a968ded45873d97b7209052602cf8b5 Makefile.common
100644 blob 74c865a67a968ded45873d97b7209052602cf8b5 Makefile.rules
100755 blob fca274c810ccf6c8a234d67b3c4eb8cb8f5c08dd cvsupdate
040000 tree c9137d50217f7def5830ab8e1b405d6d4efbd8e9 docs
100755 blob 25673559436c0756692cc032750437c6c18f6d1e getsomesrcs.sh
100755 blob ad755ceee38d1604978f7102b6c095824db17931 getsrcs.sh
040000 tree 224128734138320d0f965626955e9a8619add42b include
040000 tree e5493610205f9671b0ec2d8ffbfa1d6a655c60e4 lib
040000 tree 9d535ab8b99621402d80678551c1f75a0ca8dc75 runtime
040000 tree baef94e0fd85c3ddd89f3ce3e6f043ea5fe7a611 support
040000 tree c00aaaf47c818d8234bfd7e2ce0301572368c62b test
040000 tree ee10bd4094010150010ed35632a686443692b762 tools
040000 tree 66871d3271babc519cd345f4bb6f2af7f25b3473 utils
```
Most of the subtree objects match, but 'include', 'lib', 'test', and 'tools' do not.
Picking another two arbitrary revisions: the tree objects for r43210 match, but not those for r3333.
Do you know what would cause the trees to diverge? Could there be a correctness issue here?
> On Oct 11, 2018, at 15:27, James Y Knight via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
> TLDR: https://github.com/llvm-git-prototype/ <https://github.com/llvm-git-prototype/> exists as a read-only mirror of SVN, and is being updated continuously with a script running on an llvm-project AWS VM.
>
> Let me know what you think.
>
> I had meant to get this prototype finalized 6 months ago, and I must apologize for the delay. I hope this is close to final for what we want our git repository to look like, and that we can move forward with the remainder of the work to convert to git.
>
> At this point, there's no guarantee that the repository won't be rebuilt from scratch with new hashes, if some problem is discovered which requires changing something way back in history. But I hope we're now close to being able to declare a conversion final -- and let people start depending on the hashes being stable.
>
> This conversion uses the "flat monorepo" layout, like the previous existing git monorepo, and as discussed previously. The process generating it is different, which allows a more faithful conversion, including branches. I've also converted a bunch of the auxiliary repositories.
>
> I would request that other people help take charge of the remainder of the work. Most importantly -- making a plan for implementing the *rest* of the migration. We have https://llvm.org/docs/Proposals/GitHubMove.html <https://llvm.org/docs/Proposals/GitHubMove.html>, but I think it'll need significant fleshing out and updating. I'm happy to assist with the rest of the migration, but I'd like to _not_ be primarily responsible for other parts beyond svn->git repository conversion.
>
> Some things that could be discussed in such a plan:
> * Verifying that this conversion is good, what we want, and declaring it final (at which point the hashes can be relied upon not to change).
> * Any particular steps wanted here?
> * Converting buildbots to use git.
> * Phabricator changes?
> * How do email notifications get sent for commits?
> * Gathering github accounts for all committers, adding them to a github team.
> * Deciding upon and announcing a timeline for switching over.
> * Proposing, implementing, and testing new workflows for direct git usage:
> * Github pull requests instead of (or in addition to?) phabricator?
> * Github Protected Branch configuration options?
> * E.g. -- direct pushing to git without any restriction, or, require that pull requests be created first?
> * Automated Pre-commit testing? Do we setup CI (e.g. travis-ci.org <http://travis-ci.org/>) to do some testing on pull requests, to reduce avoidable tree breakages?
> * Any other github configuration options that need to be decided upon?
> * ....other things I forgot about at the moment...
> * Timeline for switchover.
>
>
>
> Anyways, what's been done _so far_ is a full SVN->Git repository conversion. This conversion:
> * Places the SVN revision number into the commit message, as "llvm-svn=1234"
>
> * Automatically preserves all branches from the SVN repository (it merges the branches named /$project/branches/$name into a single "$name" branch, attempting, as much as possible, to make the branch-creation commits not look insane).
>
> * Attempts to convert the svn branches in the "tags" subdir into annotated git tags pointing to the proper commit on the parent branch, where feasible. Sometimes this is impossible, since the "tags" have had modifications after their creation. (They're just branches in SVN, so you can do that, although you shouldn't). If so, they're preserved as a branch named "svntag/$name", instead.
>
> * Preserves the svn id -> email mapping that was in-use at the time of each SVN commit, as far as is known.
>
> * Fixes a bunch of -- but not all -- the CVS->SVN conversion errors (due, e.g., to files being renamed directly in the CVS repository).
>
>
>
> Most of the SVN directories are migrated into sub-directories inside the main "llvm" mono-repository:
> * cfe (renamed to clang in the conversion)
> * clang-tools-extra
> * compiler-rt
> * debuginfo-tests
> * dragonegg (also "gcc-plugin", the original name)
> * libclc
> * libcxx
> * libcxxabi
> * libunwind
> * lld
> * lldb
> * llgo
> * llvm
> * openmp
> * parallel-libs
> * polly
> * pstl
> * stacker (deleted after r40406)
> (Additionally, files added to the "monorepo-root/trunk" directory in SVN end up at the root of this repository).
>
> Some SVN projects are still active, but not part of the LLVM codebase. These get migrated to their own separate git repositories:
> * lnt
> * test-suite
> * www
> * www-pubs
> * www-releases ## TODO. Not done yet as it requires the use of git-lfs, due to large files.
> * zorg
>
> A couple inactive projects which are somewhat related to the LLVM codebase, migrated to separate repos:
> * poolalloc
> * safecode
>
> Legacy projects that are not particularly interesting, migrated to a single separate git repository named "archive":
> * clang-tests # Copy of GCC 4.2 testsuite, modified to work with clang
> * clang-tests-external # Copy of GDB testsuite
> * llvm-gcc-4.0 # GCC 4.0, modified for llvm
> * llvm-gcc-4.2 # GCC 4.2, modified for llvm
> * llvm-gcc-4-2 # (merge with above)
> * java
> * vmkit
> * nightly-test-server
> * llbrowse # An LLVM bitcode GUI browser
> * television # A different LLVM GUI browser; shows effects of transforms, etc
> * website # 2007-era snapshot of website, not actually maintained here.
> * core, llvm-top, sample, support, hlvm # from the "HLVM" refactoring attempt.
>
> Projects _not_ migrated from SVN in this conversion, since they're elsewhere already:
> * giri # Never actually developed here; actually https://github.com/liuml07/giri <https://github.com/liuml07/giri>
> * klee # Already migrated to github with history; https://github.com/klee/klee <https://github.com/klee/klee>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181018/340e2505/attachment.html>
More information about the llvm-dev
mailing list