[flang-dev] Rewriting f18's history for inclusion in llvm monorepo
Peter Waller via flang-dev
flang-dev at lists.llvm.org
Fri Dec 6 06:36:44 PST 2019
An update.
I've managed to preserve much more history.
In the new scheme, I've managed to preserve 2,159 of 2,181 non-merge
commits (in my first attempt there were only 683 commits).
Additionally, I've added metadata trailers to the commit messages to
record the original commit in the f18 repo, and a link to the pull
request, where this information is available. Right now it's present on
trivial commits but missing for others even if they only required an
"easy" rebase.
Further explanation below.
In git terminology the "tree sha" determines what is present in the file
system. This can be seen with `git rev-parse SHA^{tree}` or `git
cat-file -p SHA`. The key thing I seek to preserve is the tree sha of
merge commits on the mainline branch. If these are the same, then the
contents are the same at those commits.
There are three cases I consider. In each case, we want to ensure that
the contents of the filesystem at that commit are the same.
1. If no commits happened to mainline since the fork for a PR, then we
can rewrite the merge as a fast-forward. The tree shas of the individual
commits are preserved.
* This is a trivial case, and seems obviously safe with respect to
preserving semantics, except that it introduces commits on the mainline
which were previously in a branch. We lose the information about when
the commits landed on mainline.
2. If commits happened on mainline since the branch forked, we must
rebase. Now things are trickier. Intermediate commits in a PR (those
before it was merged) may now be semantically wrong, according to the
usual problems of rebasing. However, if the rebase has no conflicts, and
the tree shas are the same at the end of the rebase, then this may be
good enough. By the final commit of the rebase, the tree is the same as
the merge commit, so no difference is introduced there.
3. If commits happen on the mainline, and a rebase has conflicts, then
things get harder. It turns out that there are only 6 of those merge
commits with 66 commits on the branches. It might be possible to handle
those manually, but I'm unlikely to do it myself. For now, those are
squashed as before. Their commit hashes are written to hard.txt by the
flatten.sh shellscript. If someone cares about those enough, then they
could do a rebase for each of the "hard" MERGESHA given at the end of
this email with `git checkout -b rewritten-MERGESHA MERGESHA^2; git
rebase MERGESHA^1`, and fix all the conflicts carefully. If they did
this and pushed the six rewritten-MERGESHA branches somewhere,
flatten.sh could pull in those rebased commits at the appropriate
moment. So long as `git diff MERGESHA rewritten-MERGESHA` is empty after
the rebase, I don't see why this wouldn't work. Just beware getting the
semantic correctness of the individual patches correct may not be easy.
Another known deficiency is that we don't currently handle merges from
mainline back into feature branches cleanly. There are a few of those
and they look pretty weird. I haven't yet thought about this in great
detail. I'll see if I can fix this issue.
The current rewritten history is up at
https://github.com/peterwaller-arm/f18/commits/rewritten-history. I have
pushed the new script up to https://github.com/flang-compiler/f18/pull/854.
Regards,
- Peter
p.s. Here is the current output of the script:
Original history had 2181 non-merge commits.
New history has 2165 commits.
Preserved: 2159 Easy: 568 Hard merges: 6 Hard commits: 66
Merge commits which need rebasing:
b9f25364a8b201ab71f6208f1923d8ca8670595a
92a20cbdc9ec72a97ce0ea1f733b61ce1ae77de7
f11ceaa7c9df03fe5ad8cd68e5ebb9b5e1853595
d24de5513e6f746a539aaded6091759fa54998e4
2d20bc549c441c243b6085fe821d2eefd6594f39
71ae0d091585537738059637144f1985fd4b05f1
On 05/12/2019 13:27, Peter Waller wrote:
> Hi List,
>
> Following on from previous conversations about integrating f18 with
> the llvm monorepo, we wanted to preserve as much history as we can,
> but also to have a history without merge commits.
>
> I've just submitted a pull request containing a "flatten.sh" which
> tries to do this. Further information is in the pull request. To help
> with review I've pushed the rewritten history up as well.
>
> Pull request: https://github.com/flang-compiler/f18/pull/854
> Example rewritten history:
> https://github.com/peterwaller-arm/f18/tree/new
>
> It's not perfect yet, in particular for merge commits:
>
> * The commit messages aren't great (yet).
> * We could talk about exactly what metadata we want to preserve for
> merges.
>
> For now I've assumed that the second-parent of the merge commit
> contains the relevant authorship information for the patch, so the
> GIT_AUTHOR_* is taken from this, which is the last commit before a
> pull request is merged.
>
> Once we're happy with this in flang-dev, we can present this to
> llvm-dev and adapt the script for submission.
>
> Your input is welcomed.
>
> Regards,
>
> - Peter
>
More information about the flang-dev
mailing list