[flang-dev] Rewriting f18's history for inclusion in llvm monorepo

Peter Waller via flang-dev flang-dev at lists.llvm.org
Fri Dec 6 06:36:44 PST 2019


An update.

I've managed to preserve much more history.

In the new scheme, I've managed to preserve 2,159 of 2,181 non-merge 
commits (in my first attempt there were only 683 commits).

Additionally, I've added metadata trailers to the commit messages to 
record the original commit in the f18 repo, and a link to the pull 
request, where this information is available. Right now it's present on 
trivial commits but missing for others even if they only required an 
"easy" rebase.

Further explanation below.

In git terminology the "tree sha" determines what is present in the file 
system. This can be seen with `git rev-parse SHA^{tree}` or `git 
cat-file -p SHA`. The key thing I seek to preserve is the tree sha of 
merge commits on the mainline branch. If these are the same, then the 
contents are the same at those commits.

There are three cases I consider. In each case, we want to ensure that 
the contents of the filesystem at that commit are the same.

1. If no commits happened to mainline since the fork for a PR, then we 
can rewrite the merge as a fast-forward. The tree shas of the individual 
commits are preserved.

   * This is a trivial case, and seems obviously safe with respect to 
preserving semantics, except that it introduces commits on the mainline 
which were previously in a branch. We lose the information about when 
the commits landed on mainline.

2. If commits happened on mainline since the branch forked, we must 
rebase. Now things are trickier. Intermediate commits in a PR (those 
before it was merged) may now be semantically wrong, according to the 
usual problems of rebasing. However, if the rebase has no conflicts, and 
the tree shas are the same at the end of the rebase, then this may be 
good enough. By the final commit of the rebase, the tree is the same as 
the merge commit, so no difference is introduced there.

3. If commits happen on the mainline, and a rebase has conflicts, then 
things get harder. It turns out that there are only 6 of those merge 
commits with 66 commits on the branches. It might be possible to handle 
those manually, but I'm unlikely to do it myself. For now, those are 
squashed as before. Their commit hashes are written to hard.txt by the 
flatten.sh shellscript. If someone cares about those enough, then they 
could do a rebase for each of the "hard" MERGESHA given at the end of 
this email with `git checkout -b rewritten-MERGESHA MERGESHA^2; git 
rebase MERGESHA^1`, and fix all the conflicts carefully. If they did 
this and pushed the six rewritten-MERGESHA branches somewhere, 
flatten.sh could pull in those rebased commits at the appropriate 
moment. So long as `git diff MERGESHA rewritten-MERGESHA` is empty after 
the rebase, I don't see why this wouldn't work. Just beware getting the 
semantic correctness of the individual patches correct may not be easy.

Another known deficiency is that we don't currently handle merges from 
mainline back into feature branches cleanly. There are a few of those 
and they look pretty weird. I haven't yet thought about this in great 
detail. I'll see if I can fix this issue.

The current rewritten history is up at 
https://github.com/peterwaller-arm/f18/commits/rewritten-history. I have 
pushed the new script up to https://github.com/flang-compiler/f18/pull/854.

Regards,

- Peter

p.s. Here is the current output of the script:

Original history had 2181 non-merge commits.
New history has 2165 commits.

Preserved: 2159 Easy: 568 Hard merges: 6 Hard commits: 66

Merge commits which need rebasing:

b9f25364a8b201ab71f6208f1923d8ca8670595a
92a20cbdc9ec72a97ce0ea1f733b61ce1ae77de7
f11ceaa7c9df03fe5ad8cd68e5ebb9b5e1853595
d24de5513e6f746a539aaded6091759fa54998e4
2d20bc549c441c243b6085fe821d2eefd6594f39
71ae0d091585537738059637144f1985fd4b05f1

On 05/12/2019 13:27, Peter Waller wrote:
> Hi List,
>
> Following on from previous conversations about integrating f18 with 
> the llvm monorepo, we wanted to preserve as much history as we can, 
> but also to have a history without merge commits.
>
> I've just submitted a pull request containing a "flatten.sh" which 
> tries to do this. Further information is in the pull request. To help 
> with review I've pushed the rewritten history up as well.
>
> Pull request: https://github.com/flang-compiler/f18/pull/854
> Example rewritten history: 
> https://github.com/peterwaller-arm/f18/tree/new
>
> It's not perfect yet, in particular for merge commits:
>
> * The commit messages aren't great (yet).
> * We could talk about exactly what metadata we want to preserve for 
> merges.
>
> For now I've assumed that the second-parent of the merge commit 
> contains the relevant authorship information for the patch, so the 
> GIT_AUTHOR_* is taken from this, which is the last commit before a 
> pull request is merged.
>
> Once we're happy with this in flang-dev, we can present this to 
> llvm-dev and adapt the script for submission.
>
> Your input is welcomed.
>
> Regards,
>
> - Peter
>


More information about the flang-dev mailing list