[llvm-dev] New LLVM git repository conversion prototype

Bruce Hoult via llvm-dev llvm-dev at lists.llvm.org
Mon Oct 15 21:30:50 PDT 2018


Summary
========

I'd love if there was a way in the new repo to mark commits as being known
to be bad. I don't know whether this should be a separate database, a text
file in the repo with a list of hashes, tags with names fitting some
pattern, or something else.

I have an existing database of bad commits from 3.6.0 up until the end of
October last year.

If there is interest, I can make this database available, and I can also
run tests on commits for the last year, and on an ongoing basis (although
I'd really prefer that we don't commit things that are seriously broken).

Discussion
=========

There are, alas, a lot of bad commits in the llvm project history. Most of
them get backed out fairly quickly, but sometimes not.

This can cause problems with activities such as bisecting to find when a
bug or regression was introduced, or incrementally rebasing a long-lived
local branch onto upstream llvm.

I was given the task about a year ago to update a private back end for a
proprietary CPU that was based on llvm 3.6. As a preliminary step I wrote a
script to attempt to build clang for every llvm project revision since
3.6.0, use that clang to build a native HelloWorld for x86_64 Linux, and
execute the resulting HelloWorld program and check the output.

Working from the https://github.com/llvm-project/llvm-project-20170507
repository, I tested 84852 commits from ...

-------------
commit c8c6087cf0cc04bbe9291fadd75f6fcb8290854b
Author: Sanjay Patel <spatel at rotateright.com>
Date:   Wed Jan 14 16:03:58 2015 +0000

    fix typos
-------------


... to ...

-------------
commit d060b0c7ca67a02d6775febbd0afe47dfb8f1b58
Author: Marek Olsak <marek.olsak at amd.com>
Date:   Tue Oct 31 21:06:42 2017 +0000

    AMDGPU: Select s_buffer_load_dword with a non-constant SGPR offset
-------------

This took about a week on a shiny new Core i9-7980XE with 64 GB RAM.

I found 649 commits (0.76%) bad enough to not pass my test. The failures
were in 309 runs, with the distribution of lengths as follows (number of
runs, length of run):

   189 1
     52 2
     21 3
     17 4
     11 5
      7 6
      3 7
      3 8
      1 9
      2 13
      1 14
      1 16
      1 18

The long runs are in general of course caused by some bad commit that was
backed out only after a delay.

There may well be other commits that caused less serious problems, or that
affected things other than clang or x86.

I expected to find any of the following problems:

- llvm/clang fails to build
- clang crashes while building HelloWorld
- clang errors while building HelloWorld
- HelloWorld crashes
- HellowWorld produces incorrect output

If I recall correctly, I had instances of all of those. But I also
encountered one unexpected failure mode:

- clang infinite loops while building HelloWorld

On Thu, Oct 11, 2018 at 3:27 PM, James Y Knight via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> TLDR: https://github.com/llvm-git-prototype/ exists as a read-only mirror
> of SVN, and is being updated continuously with a script running on an
> llvm-project AWS VM.
>
> Let me know what you think.
>
> I had meant to get this prototype finalized 6 months ago, and I must
> apologize for the delay. I hope this is close to final for what we want our
> git repository to look like, and that we can move forward with the
> remainder of the work to convert to git.
>
> At this point, there's no guarantee that the repository won't be rebuilt
> from scratch with new hashes, if some problem is discovered which requires
> changing something way back in history. But I hope we're now close to being
> able to declare a conversion final -- and let people start depending on the
> hashes being stable.
>
> This conversion uses the "flat monorepo" layout, like the previous
> existing git monorepo, and as discussed previously. The process generating
> it is different, which allows a more faithful conversion, including
> branches. I've also converted a bunch of the auxiliary repositories.
>
> I would request that other people help take charge of the remainder of the
> work. Most importantly -- making a plan for implementing the *rest* of the
> migration. We have https://llvm.org/docs/Proposals/GitHubMove.html, but I
> think it'll need significant fleshing out and updating. I'm happy to assist
> with the rest of the migration, but I'd like to _not_ be primarily
> responsible for other parts beyond svn->git repository conversion.
>
> Some things that could be discussed in such a plan:
>   * Verifying that this conversion is good, what we want, and declaring it
> final (at which point the hashes can be relied upon not to change).
>     * Any particular steps wanted here?
>   * Converting buildbots to use git.
>   * Phabricator changes?
>   * How do email notifications get sent for commits?
>   * Gathering github accounts for all committers, adding them to a github
> team.
>   * Deciding upon and announcing a timeline for switching over.
>   * Proposing, implementing, and testing new workflows for direct git
> usage:
>     * Github pull requests instead of (or in addition to?) phabricator?
>     * Github Protected Branch configuration options?
>       * E.g. -- direct pushing to git without any restriction, or, require
> that pull requests be created first?
>       * Automated Pre-commit testing? Do we setup CI (e.g. travis-ci.org)
> to do some testing on pull requests, to reduce avoidable tree breakages?
>       * Any other github configuration options that need to be decided
> upon?
>   * ....other things I forgot about at the moment...
>   * Timeline for switchover.
>
>
>
> Anyways, what's been done _so far_ is a full SVN->Git repository
> conversion. This conversion:
>   * Places the SVN revision number into the commit message, as
> "llvm-svn=1234"
>
>   * Automatically preserves all branches from the SVN repository (it
> merges the branches named /$project/branches/$name into a single "$name"
> branch, attempting, as much as possible, to make the branch-creation
> commits not look insane).
>
>   * Attempts to convert the svn branches in the "tags" subdir into
> annotated git tags pointing to the proper commit on the parent branch,
> where feasible. Sometimes this is impossible, since the "tags" have had
> modifications after their creation. (They're just branches in SVN, so you
> can do that, although you shouldn't). If so, they're preserved as a branch
> named "svntag/$name", instead.
>
>   * Preserves the svn id -> email mapping that was in-use at the time of
> each SVN commit, as far as is known.
>
>   * Fixes a bunch of -- but not all -- the CVS->SVN conversion errors
> (due, e.g., to files being renamed directly in the CVS repository).
>
>
>
> Most of the SVN directories are migrated into sub-directories inside the
> main "llvm" mono-repository:
>   * cfe (renamed to clang in the conversion)
>   * clang-tools-extra
>   * compiler-rt
>   * debuginfo-tests
>   * dragonegg (also "gcc-plugin", the original name)
>   * libclc
>   * libcxx
>   * libcxxabi
>   * libunwind
>   * lld
>   * lldb
>   * llgo
>   * llvm
>   * openmp
>   * parallel-libs
>   * polly
>   * pstl
>   * stacker (deleted after r40406)
> (Additionally, files added to the "monorepo-root/trunk" directory in SVN
> end up at the root of this repository).
>
> Some SVN projects are still active, but not part of the LLVM codebase.
> These get migrated to their own separate git repositories:
>   * lnt
>   * test-suite
>   * www
>   * www-pubs
>   * www-releases ## TODO. Not done yet as it requires the use of git-lfs,
> due to large files.
>   * zorg
>
> A couple inactive projects which are somewhat related to the LLVM
> codebase, migrated to separate repos:
>   * poolalloc
>   * safecode
>
> Legacy projects that are not particularly interesting, migrated to a
> single separate git repository named "archive":
>   * clang-tests # Copy of GCC 4.2 testsuite, modified to work with clang
>   * clang-tests-external # Copy of GDB testsuite
>   * llvm-gcc-4.0 # GCC 4.0, modified for llvm
>   * llvm-gcc-4.2 # GCC 4.2, modified for llvm
>   * llvm-gcc-4-2 # (merge with above)
>   * java
>   * vmkit
>   * nightly-test-server
>   * llbrowse # An LLVM bitcode GUI browser
>   * television # A different LLVM GUI browser; shows effects of
> transforms, etc
>   * website # 2007-era snapshot of website, not actually maintained here.
>   * core, llvm-top, sample, support, hlvm # from the "HLVM" refactoring
> attempt.
>
> Projects _not_ migrated from SVN in this conversion, since they're
> elsewhere already:
>   * giri # Never actually developed here; actually https://github.com/
> liuml07/giri
>   * klee # Already migrated to github with history; https://github.com/
> klee/klee
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181015/285d4c49/attachment-0001.html>


More information about the llvm-dev mailing list