[cfe-dev] [llvm-dev] Sequential ID Git hook

James Y Knight via cfe-dev cfe-dev at lists.llvm.org
Thu Jun 30 09:16:20 PDT 2016


I don't think we should do any of that. It's too complicated -- and I don't
see the reason to even do it.

There's a need for the "llvm-project" repository -- that's been discussed
plenty -- but where does the need for a separate "id" that must be pushed
into all of the sub-projects come from? This is the first I've heard of
that as a thing that needs to be done.

There was a previous discussion about putting an sequential ID in the
"llvm-project" repo commit messages (although, even that I'd say is
unnecessary), but not anywhere else.



On Thu, Jun 30, 2016 at 7:42 AM, Renato Golin via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Now that we seem to be converging to an acceptable Git model, there
> was only one remaining doubt, and that's how the trigger to update a
> sequential ID will work. I've been in contact with GitHub folks, and
> this is in line with their suggestions...
>
> Given the nature of our project's repository structure, triggers in
> each repository can't just update their own sequential ID (like
> Gerrit) because we want a sequence in order for the whole project, not
> just each component. But it's clear to me that we have to do something
> similar to Gerrit, as this has been proven to work on a larger
> infrastructure.
>
> Adding an incremental "Change-ID" to the commit message should
> suffice, in the same way we have for SVN revisions now, if we can
> guarantee that:
>
>  1. The ID will be unique across *all* projects
>  2. Earlier pushes will get lower IDs than later ones
>
> Other things are not important:
>
>  3. We don't need the ID space to be complete (ie, we can jump from
> 123 to 125 if some error happens)
>  4. We don't need an ID for every "commit", but for every push. A
> multi-commit push is a single feature, and doing so will help
> buildbots build the whole set as one change. Reverts should also be
> done in one go.
>
> What's left for the near future:
>
>  5. We don't yet handle multi-repository patch-sets. A way to
> implement this is via manual Change-ID manipulation (explained below).
> Not hard, but not a priority.
>
>
>   Design decisions
>
> This could be a pre/post-commit trigger on each repository that
> receives an ID from somewhere (TBD) and updates the commit message.
> When the umbrella project synchronises, it'll already have the
> sequential number in. In this case, the umbrella project is not
> necessary for anything other than bisect, buildbots and releases.
>
> I personally believe that having the trigger in the umbrella project
> will be harder to implement and more error prone.
>
> The server has to have some kind of locking mechanism. Web services
> normally spawn dozens of "listeners", meaning multiple pushes won't
> fail to get a response, since the lock will be further down, after the
> web server.
>
> Therefore, the lock for the unique increment ID has to be elsewhere.
> The easiest thing I can think of is a SQL database with auto-increment
> ID. Example:
>
> Initially:
> sql> create table LLVM_ID ( id int not null primary key
> auto_increment, repository varchar not null, hash varchar nut null );
> sql> alter table LLVM_ID auto_increment = 300000;
>
> On every request:
> sql> insert into LLVM_ID values ("$repo_name", "$hash");
> sql> select_last_inset_id(); -> return
>
> and then print the "last insert id" back to the user in the body of
> the page, so the hook can update the Change-id on the commit message.
> The repo/hash info is more for logging, debugging and conflict
> resolution purposes.
>
> We also must limit the web server to only accept connections from
> GitHub's servers, to avoid abuse. Other repos in GitHub could still
> abuse, and we can go further if it becomes a problem, but given point
> (3) above, we may fix that only if it does happen.
>
> This solution doesn't scale to multiple servers, nor helps BPC
> planning. Given the size of our needs, it not relevant.
>
>
>   Problems
>
> If the server goes down, given point (3), we may not be able to
> reproduce locally the same sequence as the server would. Meaning
> SVN-based bisects and releases would not be possible during down
> times. But Git bisect and everything else would.
>
> Furthermore, even if a local script can't reproduce exactly what the
> server would do, it still can make it linear for bisect purposes,
> fixing the local problem. I can't see a situation in which we need the
> sequence for any other purpose.
>
> Upstream and downstream releases can easily wait a day or two in the
> unlucky situation that the server goes down in the exact time the
> release will be branched.
>
> Migrations and backups also work well, and if we use some cloud
> server, we can easily take snapshots every week or so, migrate images
> across the world, etc. We don't need duplication, read-only scaling,
> multi-master, etc., since only the web service will be writing/reading
> from it.
>
> All in all, a "robust enough" solution for our needs.
>
>
>   Bundle commits
>
> Just FYI, here's a proposal that appeared in the "commit message
> format" round of emails a few months ago, and that can work well for
> bundling commits together, but will need more complicated SQL
> handling.
>
> The current proposal is to have one ID per push. This is easy by using
> auto_increment. But if we want to have one ID per multiple pushes, on
> different repositories, we'll need to have the same ID on two or more
> "repo/hash" pairs.
>
> On the commit level, the developer adds a temporary hash, possibly
> generated by a local script in 'utils'. Example:
>
>   Commit-ID: 68bd83f69b0609942a0c7dc409fd3428
>
> This ID will have to be the same on both (say) LLVM and Clang commits.
>
> The script will then take that hash, generate an ID, and then if it
> receives two or more pushes with such hashes, it'll return the *same*
> ID, say 123456, in which case the Git hooks on all projects will
> update the commit message by replacing the original Commit-ID to:
>
>   Commit-ID: 123456
>
> To avoid hash clashes in the future, the server script can refuse
> existing hashes that are a few hours old and return error, in which
> case the developer generates a new hash, update all commit messages
> and re-push.
>
> If there is no Commit-ID, or if it's empty, we just insert a new empty
> line, get the auto increment ID and return. Meaning, empty Commit-IDs
> won't "match" any other.
>
> To solve this on the server side, a few ways are possible:
>
> A. We stop using primary_key auto_increment, handle the increment in
> the script and use SQL transactions.
>
> This would be feasible, but more complex and error prone. I suggest we
> go down that route only if keeping the repo/hash information is really
> important.
>
> B. We ditch keeping record of repo/hash and just re-use the ID, but
> record the original string, so we can match later.
>
> This keeps it simple and will work for our purposes, but we'll lose
> the ability to debug problems if they happen in the future.
>
> C. We improve the SQL design to have two tables:
>
> LLVM_ID:
>    * ID: int PK auto
>    * Key: varchar null
>
> LLVM_PUSH:
>    * LLVM_ID: int FK (LLVM_ID:ID)
>    * Repo: varchar not null
>    * Push: varchar not null
>
> Every new push updates both tables, returns the ID. Pushes with the
> same Key re-use the ID and update only LLVM_PUSH, returns the same ID.
>
> This is slightly more complicated, will need to code scripts to gather
> information (for logging, debug), but give us both benefits
> (debug+auto_increment) in one package. As a start, I'd recommend we
> take this route even before the script supports it. But it could be
> simple enough that we add support for it right from the beginning.
>
> I vote for option C.
>
>
>   Deployment
>
> I recommend we code this, setup a server, let it running for a while
> on our current mirrors *before* we do the move. A simple plan is to:
>
> * Develop the server, hooks and set it running without updating the
> commit message.
> * We follow the logs, make sure everything is sane
> * Change the hook to start updating the commit message
> * We follow the commit messages, move some buildbots to track GitHub
> (SVN still master)
> * When all bots are live tracking GitHub and all developers have moved, we
> flip.
>
> Sounds good?
>
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160630/f77f21a1/attachment.html>


More information about the cfe-dev mailing list