[llvm-dev] New LLVM git repository conversion prototype

NAKAMURA Takumi via llvm-dev llvm-dev at lists.llvm.org
Wed Oct 17 14:45:07 PDT 2018


I know I am bikeshedding.

On Thu, Oct 18, 2018 at 4:29 AM James Y Knight <jyknight at google.com> wrote:

> On Mon, Oct 15, 2018 at 4:00 PM NAKAMURA Takumi <geek4civic at gmail.com>
> wrote:
>
>> James,
>>
>> Thank you to disclose your great work!
>>
>> I had been waiting for your work due to difficulty of handling
>> inconsistent branches in my repo.
>> As far as I asked guys then, I wasn't able to hear your work.
>> Then, I began to reconstruct the repo, rewritten from scratch.
>>
>> I am happy that I can see your progression.
>>
>> Interestingly, It seems I was following similar tweaks that you did, esp.
>> revisioning the history and tweaks.
>> They satisfy me. :)
>>
>> I suggest you a few functionalities that I was doing, to honor "original"
>> authors.
>> Excuse me if they are bikeshedding.
>>
>
> Thanks for the suggestions -- I'm really happy to have some feedback on
> the conversion!
>
> 1) Apply "Signed-off-by:"
>> Like git-svn's "--use-log-author". We don't force such a feature for years
>> but some authors are using it.
>>
>
> I didn't find this style is consistently used enough to be worthwhile, by
> itself. (There's only 652 out of 344528 revisions), many of whom were the
> committer already, and at a glance, I'm not sure all the other instances
> are necessarily the author.
>

> But, I've been convinced that it's not feasible to do this in a
> reliable-enough way to make it be actually useful for much, and therefore,
> dropped the idea.
>

I know it is less useful to discover the history.

I think we may suggest "Signed-off-by:" for incoming authors (and
committers).
For example, could we introduce the phab a functionality to add the line
automatically?


> I initially had thought it would be valuable to try to determine original
> authors via applying a bunch of different heuristics, and represent that in
> the git metadata. For example, by looking for "Patch by" lines, and
> variations thereof -- and then interpreting people's realnames back to
> author identity, since email addresses aren't generally used. Or, searching
> the mailing list archives, to find the original submission of the patch
> that eventually got committed.
>

I think;
  - You could try and apply many heuristics just for the history, as far as
they are reliable and stable.
  - We should apply a few heuristics for incoming commits on trunk.
  - We may apply a few extra heuristics on branches. I assume (2) may be
applicable here.

I would like honor original authors. :)

2) Discover merged commits and cherry-pick them
>>
> In branches, we can see many "Merging rXXXXXX:".
>> I tried picking them up.
>> For example, see;
>> https://github.com/llvm-project/llvm-project-ng/commits/release_70
>> I am afraid that this functionality would make building slower.
>> In my case, I am using git-fast-import. To do it;
>>   - Flush blobs with "checkpoint" to consolidate HEAD's commit and tree.
>>   - If simple comparison (with git-merge-tree) failed, I have to use
>> *slow* gitindex,
>>      for git-read-tree, git-apply, and git-write-tree
>>      to confirm that cherry-picked commit is identical to original commit.
>>
>
> As cherry-picks don't get represented in git metadata, I think recreating
> a commit with git cherry-pick won't actually change anything in the
> resulting repository?
>

As a result, cherry-picking should not modify the tree (and the committer),
just the author and the log.
I mentioned how difficult the way is to confirm that "cherry-picking
doesn't modify the tree".

I have an idea. there is a faster way to run an supplemental pass to
confirm and substitute commits.
I could avoid too frequent "checkpoint" there.


>  3) Resurrect "Revert Revert" with cherry-picking.
>>
> I don't like one. It's just my preference.
>>
>
> I don't really understand what you mean here.
>

Sorry about my miswording. I meant;
"Substitute a commit "Revert Revert" to discover and cherry-pick the
original commit"

Although I don't like "Revert Revert", I wonder such a heuristic could be
applied to trunk.
I suppose (3) as "just an idea".


> 4) Pull the author from phab
>>
> (It's just an idea)
>>
>
> This idea would go with #1.
>

I know it's hairy. We may not unveil email fields without acknowledgement
of each user of the phab.
Thus, (4) is just an idea.

I think it'd be enough if the phab would emit "Signed-off-by:" for incoming
commits.

p.s. I don't like Git's git-notes. It was supported by ancient Github, but
dropped.

Takumi

 I appreciate your great work. Thanks again!
>>
> Ask me anything if you are interested.
>>
>> P.S. I won't attend the devmtg 2018.
>>
>> Takumi Nakamura
>>
>>
>> On Fri, Oct 12, 2018 at 7:28 AM James Y Knight via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> TLDR: https://github.com/llvm-git-prototype/ exists as a read-only
>>> mirror of SVN, and is being updated continuously with a script running on
>>> an llvm-project AWS VM.
>>>
>>> Let me know what you think.
>>>
>>> I had meant to get this prototype finalized 6 months ago, and I must
>>> apologize for the delay. I hope this is close to final for what we want our
>>> git repository to look like, and that we can move forward with the
>>> remainder of the work to convert to git.
>>>
>>> At this point, there's no guarantee that the repository won't be rebuilt
>>> from scratch with new hashes, if some problem is discovered which requires
>>> changing something way back in history. But I hope we're now close to being
>>> able to declare a conversion final -- and let people start depending on the
>>> hashes being stable.
>>>
>>> This conversion uses the "flat monorepo" layout, like the previous
>>> existing git monorepo, and as discussed previously. The process generating
>>> it is different, which allows a more faithful conversion, including
>>> branches. I've also converted a bunch of the auxiliary repositories.
>>>
>>> I would request that other people help take charge of the remainder of
>>> the work. Most importantly -- making a plan for implementing the *rest* of
>>> the migration. We have https://llvm.org/docs/Proposals/GitHubMove.html,
>>> but I think it'll need significant fleshing out and updating. I'm happy to
>>> assist with the rest of the migration, but I'd like to _not_ be primarily
>>> responsible for other parts beyond svn->git repository conversion.
>>>
>>> Some things that could be discussed in such a plan:
>>>   * Verifying that this conversion is good, what we want, and declaring
>>> it final (at which point the hashes can be relied upon not to change).
>>>     * Any particular steps wanted here?
>>>   * Converting buildbots to use git.
>>>   * Phabricator changes?
>>>   * How do email notifications get sent for commits?
>>>   * Gathering github accounts for all committers, adding them to a
>>> github team.
>>>   * Deciding upon and announcing a timeline for switching over.
>>>   * Proposing, implementing, and testing new workflows for direct git
>>> usage:
>>>     * Github pull requests instead of (or in addition to?) phabricator?
>>>     * Github Protected Branch configuration options?
>>>       * E.g. -- direct pushing to git without any restriction, or,
>>> require that pull requests be created first?
>>>       * Automated Pre-commit testing? Do we setup CI (e.g. travis-ci.org)
>>> to do some testing on pull requests, to reduce avoidable tree breakages?
>>>       * Any other github configuration options that need to be decided
>>> upon?
>>>   * ....other things I forgot about at the moment...
>>>   * Timeline for switchover.
>>>
>>>
>>>
>>> Anyways, what's been done _so far_ is a full SVN->Git repository
>>> conversion. This conversion:
>>>   * Places the SVN revision number into the commit message, as
>>> "llvm-svn=1234"
>>>
>>>   * Automatically preserves all branches from the SVN repository (it
>>> merges the branches named /$project/branches/$name into a single "$name"
>>> branch, attempting, as much as possible, to make the branch-creation
>>> commits not look insane).
>>>
>>>   * Attempts to convert the svn branches in the "tags" subdir into
>>> annotated git tags pointing to the proper commit on the parent branch,
>>> where feasible. Sometimes this is impossible, since the "tags" have had
>>> modifications after their creation. (They're just branches in SVN, so you
>>> can do that, although you shouldn't). If so, they're preserved as a branch
>>> named "svntag/$name", instead.
>>>
>>>   * Preserves the svn id -> email mapping that was in-use at the time of
>>> each SVN commit, as far as is known.
>>>
>>>   * Fixes a bunch of -- but not all -- the CVS->SVN conversion errors
>>> (due, e.g., to files being renamed directly in the CVS repository).
>>>
>>>
>>>
>>> Most of the SVN directories are migrated into sub-directories inside the
>>> main "llvm" mono-repository:
>>>   * cfe (renamed to clang in the conversion)
>>>   * clang-tools-extra
>>>   * compiler-rt
>>>   * debuginfo-tests
>>>   * dragonegg (also "gcc-plugin", the original name)
>>>   * libclc
>>>   * libcxx
>>>   * libcxxabi
>>>   * libunwind
>>>   * lld
>>>   * lldb
>>>   * llgo
>>>   * llvm
>>>   * openmp
>>>   * parallel-libs
>>>   * polly
>>>   * pstl
>>>   * stacker (deleted after r40406)
>>> (Additionally, files added to the "monorepo-root/trunk" directory in SVN
>>> end up at the root of this repository).
>>>
>>> Some SVN projects are still active, but not part of the LLVM codebase.
>>> These get migrated to their own separate git repositories:
>>>   * lnt
>>>   * test-suite
>>>   * www
>>>   * www-pubs
>>>   * www-releases ## TODO. Not done yet as it requires the use of
>>> git-lfs, due to large files.
>>>   * zorg
>>>
>>> A couple inactive projects which are somewhat related to the LLVM
>>> codebase, migrated to separate repos:
>>>   * poolalloc
>>>   * safecode
>>>
>>> Legacy projects that are not particularly interesting, migrated to a
>>> single separate git repository named "archive":
>>>   * clang-tests # Copy of GCC 4.2 testsuite, modified to work with clang
>>>   * clang-tests-external # Copy of GDB testsuite
>>>   * llvm-gcc-4.0 # GCC 4.0, modified for llvm
>>>   * llvm-gcc-4.2 # GCC 4.2, modified for llvm
>>>   * llvm-gcc-4-2 # (merge with above)
>>>   * java
>>>   * vmkit
>>>   * nightly-test-server
>>>   * llbrowse # An LLVM bitcode GUI browser
>>>   * television # A different LLVM GUI browser; shows effects of
>>> transforms, etc
>>>   * website # 2007-era snapshot of website, not actually maintained here.
>>>   * core, llvm-top, sample, support, hlvm # from the "HLVM" refactoring
>>> attempt.
>>>
>>> Projects _not_ migrated from SVN in this conversion, since they're
>>> elsewhere already:
>>>   * giri # Never actually developed here; actually
>>> https://github.com/liuml07/giri
>>>   * klee # Already migrated to github with history;
>>> https://github.com/klee/klee
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181018/3a709340/attachment.html>


More information about the llvm-dev mailing list