[llvm-dev] [cfe-dev] [Openmp-dev] Bugzilla migration is stopped again

Fri Dec 10 08:42:04 PST 2021

On Fri, Dec 10, 2021 at 7:33 AM Anton Korobeynikov <anton at korobeynikov.info>
wrote:

> Thanks for the try!
>
> From the quick scan:
>
> 1. There are no labels
>

There are labels, but only according to the "keywords" field from Bugzilla.
https://github.com/Quuxplusone/LLVMBugzillaTest/issues?q=is%3Aopen+is%3Aissue+label%3Aaccepts-invalid
I agree it would make sense to apply more labels in Step 3
<https://github.com/Quuxplusone/BugzillaToGithub#step-3-process-each-xml-bug-into-githubs-json-schema>
(e.g. according to the "Product" field).
If you document the mapping somewhere, it would be trivial to add to my
script and I could have 10,000 issues regenerated in about 3 hours.
Also needed: the mapping from Bugzilla usernames to GitHub usernames.

> 2. Attachments are not real – they are just links to bugzilla and will
> be obsolete if bugzilla is e.g. down
>

Right. This is part of the "dumbed-down Load step", i.e. "take the actual
data and munge it into the closest possible thing that can be loaded using
the public API": GitHub's beta Issues Import API doesn't support adding
files to issues.  (Also, e.g.,
- forging authorship of comments is impossible using the public API
- for cross-referencing to other issues, I'm currently using links back
into the old Bugzilla's show_bug.cgi; but really these links should go to
something like https://reviews.llvm.org/PR1234, which would be under our
control and could be HTTP-redirected to their corresponding GitHub issues
)

> 3. Each attachment results in 2 comments, one of each is redundant
>

Ack. I wrote code to fix this for the very simplest "Created attachment
1234" auto-comments, but had not noticed that sometimes the auto-comment is
more complicated.
E.g.
https://github.com/Quuxplusone/LLVMBugzillaTest/issues/10729#issuecomment-990590574
This wouldn't be hard to fix.

> 4. CC list is strange, e.g.
> https://github.com/Quuxplusone/LLVMBugzillaTest/issues/12187 CC's to
> "mail.sandbox.de"
>

That's partly an artifact of my lack of mapping from Bugzilla usernames to
GitHub usernames (the relevant codepath
<https://github.com/Quuxplusone/BugzillaToGithub/blob/822dbac/xml-to-json.py#L186-L195>
is
just a stub), but also something super weird...!
The email addresses from Bugzilla show up in the XML when viewed in Chrome,
but not when fetched in Python or curl.
https://stackoverflow.com/questions/70307092/fetching-xml-from-bugzilla-gives-different-results-with-curl-versus-browser

5. All text is in verbatim boxes (e.g.
> https://github.com/Quuxplusone/LLVMBugzillaTest/issues/12092) making
> it almost impossible to read due to horizontal scroll
>

The monospace font is intentional on my part, and important even for
https://bugs.llvm.org/show_bug.cgi?id=12092 because a big part of the
initial comment is indented C++ code. However, I should implement
linebreaking: looks like Bugzilla's website layout breaks around 84
characters, and 80 would be perfectly sensible.
Will fix.

6. There are no "depends on" / "blocks on" references (see
> https://github.com/Quuxplusone/LLVMBugzillaTest/issues/10900)
>

Ack.
(This is an artifact of my not knowing that the <dependson> element exists.
I should have thought to grep and get a list of all the tags that exist in
the XML (that is, in the 51567 "xml/*.xml" files produced during Step 1 in
the README
<https://github.com/Quuxplusone/BugzillaToGithub#step-1-export-your-bugzilla-bugs-to-xml>),
to make sure I understood each of them.)
Will fix, at least for the <dependson> tag.

> 7. There are no cross-references in case of duplicates (see
> https://github.com/Quuxplusone/LLVMBugzillaTest/issues/10729)
>

Ack.
I thought about mangling the duplicate-bug-number into the "Status" line,
like Bugzilla does, but decided not to worry about it in the interest of
being-done-by-my-self-imposed-EOW-deadline. :)
There's also a harder issue on bug 10729's final comment
<https://github.com/Quuxplusone/LLVMBugzillaTest/issues/10729#issuecomment-990590581>,
where it says

    Yes, apparently I did. Sorry. I'll attach the logs to that issue instead :)
    *** This bug has been marked as a duplicate of bug 9072
<https://bugs.llvm.org/show_bug.cgi?id=9072> ***

where we want that to be both monospaced *and* hyperlinked — Markdown can't
do hyperlinks inside triple-backticks.
The obvious solution is for the script to special-case
Bugzilla's auto-comment and pull it outside of the
triple-backticked section.
I should grep for all the different Bugzilla auto-comments too. It looks
like there are only three possible auto-comments:

$ grep -hor '[*][*][*] .* [*][*][*]' xml/ > out

$ sed 's/[0-9][0-9]*/9/g' out | sort | uniq -c | sort -rn |
eyeballing-by-arthur

2563 *** Bug 9 has been marked as a duplicate of this bug. ***

2504 *** This bug has been marked as a duplicate of bug 9 ***

  76 *** This bug has been marked as a duplicate of 9 ***

...
>
> It's pretty straightforward to come to the present state and there are
> tools for this, we've been at this point in 2019 (see e.g.
> https://github.com/asl/llvm-bugzilla/issues as it was outlined in LLVM
> DevMtg 2019 roundtable discussion). The non-trivial part is to
> workaround various GitHub issues which are also different depending on
> API used.
>

Nice!  Yeah, steps 1, 2, 3
<https://github.com/Quuxplusone/BugzillaToGithub#step-1-export-your-bugzilla-bugs-to-xml>
(Export and Transform) are possible for literally anyone to do — and also
relatively *simple*, in that I wrote those scripts in a single week of
evenings. :)  Step 4
<https://github.com/Quuxplusone/BugzillaToGithub#step-4-import-your-json-bugs-into-github>,
the Load step, is equally *simple* but requires special magic powers that
only a GitHub SRE would have — e.g., forging comment authorship. If I were
doing this migration for real, I'd ask what API they plan to use, and ask
them to test it out on a blank repo in exactly the same way that you and I
have now both done with
https://github.com/asl/llvm-bugzilla/issues
and
https://github.com/Quuxplusone/LLVMBugzillaTest/issues
That is, write the script that's going to be used, and then test it out,
repeatedly, until it works perfectly... and then test once more, just for
safety's sake, before doing it live.
The mantras here are
- "With enough eyeballs, all bugs are shallow" (we're both identified
deficiencies in each other's scripts, and can now fix them!)
- "Measure twice, cut once" (rehearse the entire deploy plan in blank repos
until it's perfect, then do *only the perfect version* live)

(Also, ideally, someone involved with LLVM would just get hired at GitHub,
to cut down on round-trip time. But I'm not volunteering. ;))

–Arthur
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211210/d9a1c658/attachment.html>