[flang-dev] Rewriting f18's history for inclusion in llvm monorepo (third attempt, C rewrite)

Peter Waller via flang-dev flang-dev at lists.llvm.org
Tue Dec 17 12:52:23 PST 2019

Hi All,

A third attempt, following feedback and study.

There were issues with the shell script leading to surprising trees and 
generating the Original-commit trailer which I found easier to 
workaround by using the lower-level C api provided by libgit2. If you 
want to see the script please take a look at the pull request: 
https://github.com/flang-compiler/f18/pull/854 - I warn you, it's ugly! 
The old quote, "I wanted to write a shorter program, but I didn't have 
the time" comes to mind :).

Now there is a linear history, keeping the empty merge commits. The 
commits rewrite the content under the flang/ directory and take the 
current llvm-project master branch as the parent for (what was) the root 
commit. This is something that can in principle be pushed to 
llvm-project, assuming everyone (and llvm-dev) are all happy.

=== Key links:

* Tree, merged with LLVM: 

* Rewritten history: 

* Rewritten history without llvm merge: 

* Link to the program pull request: 

=== Next steps:

* I understand that the flang community would like to push this into 
upstream before the llvm-10 branch in mid-January.
* I'll email llvm-dev to solicit feedback with the intent that we would 
like to do this in the near future.
* Modulo any feedback from this email or llvm-dev, I believe it's ready 
to go. It just requires someone to follow the steps, run the script, and 
push the resulting branch onto llvm-project.
* When we're ready to pull the trigger, I think we should:
   * permanently stop accepting commits on flang-compiler/f18, and 
redirect those commits to llvm-project.
   * run the rewrite script
   * verify the rewrite (which should be fairly easily)
   * push the new history into llvm-project.


More detail follows for anyone interested.

=== Features:

* Commits are now prefixed with [flang-compiler/f18#PRNUMBER] to 
indicate the pull request, if available, the commit was merged in.
* Issue/PR references are rewritten as flang-compiler/f18#NUMBER, 
according to github's convention for cross-repository references.
* Empty merge commits are now kept, so that the pull request commit 
message (which usually includes the pull request title) is present in 
the lineage.
* Original-commit: trailer header shows the pre-rewrite commit sha.
* Reviewed-on: trailer links to flang-compiler/f18 pull request for the 
merge commit which pulled the merge in.
* Manual rebases can be taken from branches named rebase-{12 digit merge 
sha}, if they are present.
* If the remote branch llvm-project/master is available, then it also 
rewrites the commits under the flang/ directory with the latest 
llvm-project master as the parent of the first flang commit.
* If you want to run it yourself, it takes 3 seconds to compile and 3 
seconds to run.

The program generates links and references to commit shas in 
https://github.com/flang-compiler/f18 under the assumption that it will 
continue to exist, or get renamed, and if it were renamed that github's 
rename functionality with do the right thing assuming that the f18 name 
is not reused for a different repository.

=== Result:

* I've pushed a sample rewritten history with the rewrite up to my 
personal github repository. At time of writing it contains 2,721 commits.
* I believe the resulting history is suitable to be pushed onto 
* I've done a best-effort sanity check that there are no significant 
differences introduced in the rewriting. There may be some minor 
differences on branch commits (and some branch commits may not compile 
anymore where they once did), but I have high confidence that the merge 
commits are equivalent.
* I've done the easy manual rebases on a best-effort basis. There are 
only 3 rebases left which weren't "easy". This results in some commits 
which don't have the same checkout (and therefore may not compile for 
example), but the script ensures that by the time of the merge commit, 
there are no differences. Many on-branch commits are "the same", if no 
other commits happened on the master branch during the feature branch.
* 110 commits have been dropped from history: 45 now-empty 
feature-branch merges, and 27 got squashed, and 38 discarded.
* Please note that because patches have been rebased, they aren't what 
authors originally published, especially if it required a manual rebase. 
Any mistake made during the rebase looks as though it is attributed to 
the author. Hopefully the Original-commit provides a clear reference to 
the ground-truth of what the author originally did.

=== Validation:

I've done the best I can to ensure the history is as faithful as it can 
be. Please take a look for yourself and see what it looks like. I 
believe with a reasonable amount of confidence that the checkouts are 
the same at the merge commits, which is the key promise.

* Feature branch patch deltas: There is a shell script included in the 
comment at the top of the program which enables looking at the 
diff-to-patches (yes, diffs-of-diffs) from the rebase. To use, set 
use_original_message = true. Mostly I see context changes, and a little 
bit of fall out from the rebasing which doesn't look too concerning to me.

* The merge-commit promise: If you do `git log --format="%T %s"` 
--reverse --first-parent origin/master > A && git log --format="%T %s" 
rewritten-history-v2 > B` and run `git diff --word-diff --no-index A B` 
to compare the two, you can see that all merge commits have identical 
trees, which is the key promise. You can also get a feel for how often 
commits end up being the same before and after rewrite.

* I've verified that my name does not appear on any commits (and not as 
the committer) as a consequence of history rewriting.

=== Other hints:

If anyone wants to have a go at doing the remaining 3 rebases, run one 
of these lines, do the rebase, and verify that at the end of the rebase 
"git diff $M" is empty. Then push the branch somewhere and let me know 
about it.

   M=d341464e7ffd; git checkout -B rebase-${M} ${M}^2; git rebase ${M}^1 
# PR #137, 6 commits, author hsuauthai
   M=a24701e31301; git checkout -B rebase-${M} ${M}^2; git rebase ${M}^1 
# PR #539, 13 commits, author Tim Keithe new root committh
   M=24856b82387a; git checkout -B rebase-${M} ${M}^2; git rebase ${M}^1 
# PR #544, 8 commits, author jeanPerier

The following link showshow those merge commits appear (squashed) in the 
history, if that doesn't happen: 
Rebasing is a grungy thing to do, but at least we know that the 
checkouts are the same at the merge commits. The only alternative I'm 
aware of is to squash the second-parent history of the merge commit.

If you want to reproduce the same rewritten history as I've published 
it, you'll need to add my fork as a remote, fetch my rebase branches, 
and and create them with something like `for ref in $(git for-each-ref 
--format='%(refname)' 'refs/remotes/peterwaller-arm/rebase-*' | xargs 
-n1 basename); do checkout $ref; done`. If you try to reproduce and fail 
let me know. The script should be reproducible. If the rebase branches 
make someone unhappy, it is easy enough to fall back to simply squashing.


- Peter

More information about the flang-dev mailing list