[cfe-dev] [llvm-dev] [lldb-dev] [GitHub] RFC: Enforcing no merge commit policy

Thu Mar 21 08:33:00 PDT 2019

A vague and probably incorrect, but hopefully helpful answer.

For the theoretical purposes, every git commit contains a full, 
self-contained snapshot of the source tree, and this snapshot has, 
theoretically, nothing to do with what the "previous" state of the tree 
was. It's easier not to think about commits as if they are diffs - think 
of them as of full snapshots. It's, of course, more efficient than that 
under the hood, but that's more or less irrelevant.

In this sense, parent commit references are just pieces of metadata 
attached to that snapshot, but they don't need to be indicative of 
anything. Of course, when creating a new commit, the parent commit must 
already exist in the current checkout, so the chain of commits does 
indeed look like some sort of a history, but that's about it.

In particular, there's nothing that prevents a commit from having 
multiple parent commits. Such commits are called merge commits and they 
can be thought of as describing a specific state into which the source 
tree has transitioned after multiple people have been doing their work 
independently, without synchronizing with each other. When the merge 
commit is being made, it doesn't require any of those people or anyone 
else to anyhow agree upon the order in which they were doing their work. 
It is only necessary for the committer to define the final state of the 
source tree (which is, well, "the" commit).

Contents of the merge commit, i.e. the source tree after the commit, 
theoretically may or may not have anything to do with the changes that 
are being merged. But git also includes best-effort tools to help 
producing merge commits (i.e., "git merge" - it essentially operates by 
converting commits into diffs and trying to apply diffs from the merge 
branch on top of the first branch), and they work fairly well unless 
there are *actual* conflicts that cannot be resolved automatically (in 
which case git politely asks you to resolve the conflict and provides 
fancy conflict markers, as you'd expect).

So *normally* a merge commit is a commit that has multiple parent 
commits and changes the source tree into something that *would* have 
appeared if its parent commits (and their parents, etc., counting 
repeating parents once in case of diamond-shaped histories) were applied 
in a certain order. But the merge commit doesn't contain any information 
on these intermediate source tree states that emerged during merge - it 
only contains the final state.

Now, there's also a different technique: instead of making a merge, you 
can do a sequence of "cherry-picks" from the branch you want to merge 
into the branch you're merging it into. By definition, this creates a 
new non-merge commit for every commit on the branch that you want to 
merge, and its contents would be as if you applied the new commit's diff 
to the source tree of its parent. This way you avoid creating merge 
commits and obtain the serial history. The "git rebase" thing is just a 
tool for this kind of mass cherry-picking, and this approach is 
described as "rebasing your branch on top of master" (or on top of 
wherever you want to merge it).

The downside of the rebase is that it is duplicating commits. That is, 
the commit that ended up in master has nothing to do with the commit on 
your branch, it has no (formal) reference back to the original commit or 
anything; they're not in a parent-child relationship, they're in a 
different sort of relationship. So rebasing is best used for merging 
your local branch that you don't want to publish as-is anyway. 
Cherry-picking in public is only used for copying commits from, say, 
master to a release branch, where you want to extract a specific commit 
from the middle of the branch without bringing in the whole branch.

If you're doing merge commits, you might lose linear history, but you 
obtain another fancy invariant: every piece of work - i.e., every patch, 
every merge conflict resolution - appears in the repository exactly 
once, under a unique identifier, and the non-linear source control 
history becomes an accurate representation of the real history of 
development.

On 3/20/19 3:25 PM, Kristina Brooks via cfe-dev wrote:
> Excuse my ignorance (I'm not great with Git) but how would it differ for workflows of people
> who use a Git repository for local work but still use `svn up + patch + svn commit <list of
> files>` to actually land post CR or for NFC patches, while resolving conflicts during a
> pull into a local (non-trunk) branch manually, after the eventual full switch to GitHub?
>
> I'm aware that SVN operates using the lock model as opposed to Git essentially making the
> history linear; Are merge commits multiple commits that are landed as part of a single
> Git "push" (ie. unsquashed), or attempts to do anything that would result in a creation
> or merging of a branch on the remote?
>
> Thank you.
>
> On 3/20/2019 6:53 PM, Tom Stellard via llvm-dev wrote:
>> On 03/20/2019 11:38 AM, Zachary Turner wrote:
>>> It sounds like we need to get someone from the Foundation (chandlerc@, lattner@, tanya@, someone else?) to reach out to them offline about this.
>>>
>> Yes, we will try to reach out to GitHub directly about this, but I still
>> think we need some kind of contingency plan in case pre-receive hooks
>> or even a new kind of branch protection won't be an option for us.
>>
>> -Tom
>>
>>> On Wed, Mar 20, 2019 at 11:23 AM Arthur O'Dwyer <arthur.j.odwyer at gmail.com <mailto:arthur.j.odwyer at gmail.com>> wrote:
>>>
>>>      On Wed, Mar 20, 2019 at 2:19 PM Tom Stellard via cfe-dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>>>
>>>          On 03/20/2019 10:41 AM, Zachary Turner wrote:
>>>          >
>>>          > On Tue, Mar 19, 2019 at 12:00 PM Tom Stellard via lldb-dev <lldb-dev at lists.llvm.org <mailto:lldb-dev at lists.llvm.org> <mailto:lldb-dev at lists.llvm.org <mailto:lldb-dev at lists.llvm.org>>> wrote:
>>>          >
>>>          >     Hi,
>>>          >
>>>          >     I would like to follow up on the previous thread[1], where there was a consensus
>>>          >     to disallow merge commits in the llvm github repository, and start a discussion
>>>          >     about how we should enforce this policy.
>>>          >
>>>          >     Unfortunately, GitHub does not provide a convenient way to fully enforce this policy.
>>>          >
>>>          >
>>>          > Why isn't this enforceable with a server-side pre-receive hook?
>>>
>>>          GitHub[1] only supports pre-receive hooks in the 'Enterprise Server'
>>>          plan, which is for self-hosted github instances.
>>>
>>>
>>>      AIUI, the GitHub team is perfectly willing to help out the LLVM project in whatever way LLVM needs, including but not limited to turning on server-side hooks for us.
>>>      https://twitter.com/natfriedman/status/1086470665832607746
>>>
>>>      Server-side hooks are *the *answer to this problem. There is no problem. You just use a server-side hook.
>>>
>>>      (Whether or not to use GitHub PRs is an orthogonal question. You can use hooks with PRs, or hooks without PRs; PRs with hooks, or PRs without hooks.)
>>>
>>>      –Arthur
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev