[llvm-dev] RFC: Bugzilla migration plan

Anton Korobeynikov via llvm-dev llvm-dev at lists.llvm.org
Fri Jul 10 01:10:48 PDT 2020


Dear all,

Over the last few weeks with the help of GH folks I've been exploring
the options of Bugzilla migration. I believe finally we came to the
viable solution which is detailed below.

It turned out that GitHub has an internal project rehydration tool
that could be used to populate the empty repo contents from the simple
serialized format. There is a big advantage of this approach as
compared to using GH API as we are not bound to various thresholds and
throttling limits (remember, that we need to import 35k+ bz issues).
The downside is that such rehydration requires the empty repo and we
cannot delete the current llvm-project: this way we will lose
releases, fork connections, stars and watches. Unfortunately, there is
no way to recreate releases while keeping the origins dates, so this
is a no-go for us. Losing forks connections would strongly affect
downstream users as well. This allowed to formulate the following
scheme:

1. Migrate Bugzilla to a new repo, say, llvm-bugzilla-import using the
internal storage format.
2. Install redirects llvm.org/PR1234 => gh/llvm/llvm-bugzilla-import/issues/1234
3. Wipe existing issues and pull requests
4. Migrate all issues from llvm-bugzilla-import to llvm-project using
GH API. Github will take about llvm-bugzilla-import/issues/1234 =>
llvm-project/issues/5678 redirects

The only downside of this approach is that we will be seeing 30k
events like "llvm-bugzilla-import/issues/1234 migrated to
llvm-project/issues/5678".

Here is the tentative timeline / list of action points:

1. Collect the mapping email (used by bugzilla) => GH account name
(used by issues). We are going to collect using different sources:
  - Auto-populating the mapping from the list of known committers
  - Asking GH API (works only if a person made their email public and
only when allowed by local law)
  - Emailing everyone who submitted to Bugzilla over last year or
maybe two asking to fill in the form with the GH username
  - We would likely allow a month or so to let everyone respond.
2. While 1. is in progress, we will work on various format issues for
migration. For this we will use probable first 1k issues or so. It
would be nice to include some meta-bugs here to ensure we could
re-recreate issues. Things to consider:
  - Comment migration (GH uses markdown everywhere, so we'd need to
carefully escape bugzilla contents)
  - Components => labels mapping and migration
  - Linking between the issues. Maybe automatically replace PR1234 in
the text with #1234 to enable auto linking.
  - Authorship: reporter / commenter
  - Attaches
3. After we are sure everyone is ready, we will do the test migration
of the whole bugzilla.
  - Estimate the necessary time it would be required to make such a transition.
  - Fix remaining issues, if any
4. Put bugzilla into read-only mode and perform the final migration to
llvm-bugzilla-archive
5. Wipe issues / PRs in llvm-project repo and perform migration from
llvm-bugzilla-archive to llvm-project
6. Migration done. Probably bugzilla will be kept in read-only mode
for some time just for the sake of consistency and should any issues
be found.

Any comments & ideas?
-- 
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University


More information about the llvm-dev mailing list