[llvm-dev] Sequential ID Git hook

Thu Jun 30 08:13:29 PDT 2016

On 6/30/2016 7:43 AM, Renato Golin via llvm-dev wrote:
> Given the nature of our project's repository structure, triggers in
> each repository can't just update their own sequential ID (like
> Gerrit) because we want a sequence in order for the whole project, not
> just each component. But it's clear to me that we have to do something
> similar to Gerrit, as this has been proven to work on a larger
> infrastructure.

I'm assuming that pushes to submodules will result in a (nearly) 
immediate commit/push to the umbrella repo to update it with the new 
submodule head.  Otherwise, checking out the umbrella repo won't get you 
the latest submodule updates.

Since updates to the umbrella project are needed to synchronize it for 
updates to sub-modules, it seems to me that if you want an ID that 
applies to all projects, that it would have to be coordinated relative 
to the umbrella project.

>   Design decisions
>
> This could be a pre/post-commit trigger on each repository that
> receives an ID from somewhere (TBD) and updates the commit message.
> When the umbrella project synchronises, it'll already have the
> sequential number in. In this case, the umbrella project is not
> necessary for anything other than bisect, buildbots and releases.

I recommend using git tag rather than updating the commit message 
itself.  Tags are more versatile.

> I personally believe that having the trigger in the umbrella project
> will be harder to implement and more error prone.

Relative to a SQL database and a server, I think managing the ID from 
the umbrella repository would be much simpler and more reliable.

Managing IDs from a repo using git meta data is pretty simple.  Here's 
an example script that creates a repo and allocates a push tag in 
conjunction with a sequence of commits (here I'm simulating pushes of 
individual commits rather than using git hooks for simplicity).  I'm not 
a git expert, so there may be better ways of doing this, but I don't 
know of any problems with this approach.

#!/bin/sh

rm -rf repo

# Create a repo
mkdir repo
cd repo
git init

# Create a well known object.
PUSH_OBJ=$(echo "push ID" | git hash-object -w --stdin)
echo "PUSH_OBJ: $PUSH_OBJ"

# Initialize the push ID to 0.
git notes add -m 0 $PUSH_OBJ

# Simulate some commits and pushes.
for i in 1 2 3; do
   echo $i > file$i
   git add file$i
   git commit -m "Added file$i" file$i
   PUSH_TAG=$(git notes show $PUSH_OBJ)
   PUSH_TAG=$((PUSH_TAG+1))
   git notes add -f -m $PUSH_TAG $PUSH_OBJ
   git tag -m "push-$PUSH_TAG" push-$PUSH_TAG
done

# list commits with push tags
git log --decorate=full

Running the above shows a git log with the tags:

commit a4ca4a0b54d5fb61a2dacbab5732d00cf8216029 (HEAD, tag: 
refs/tags/push-3, refs/heads/master)
...
     Added file3

commit e98e2669569d5cfb15bf4cd1f268507873bcd63f (tag: refs/tags/push-2)
...
     Added file2

commit 5c7f29107838b4af91fe6fa5c2fc5e3769b87bef (tag: refs/tags/push-1)
...
     Added file1

The above script is not transaction safe because it runs commands 
individually.  In a real deployment, git hooks would be used and would 
rely on push locks to synchronize updates.  Those hooks could also 
distribute ID updates to the submodules to keep them synchronized.

Tom.