[llvm-dev] Commit module to Git after each Pass

Fri Jun 15 10:49:10 PDT 2018

On Fri, Jun 15, 2018 at 9:52 AM Troy Johnson via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> > FWIW: We could also just have a mode that dumps 1 file per pass. That is
> enough to make it convenient/easy to run diff between passes. (And if you
> wanted to you could still
> > make a git repository out of it with an external script).
> >
> > - Matthias
>
> I have done this before and would strongly encourage this approach as
> opposed to direct emission to std[out|err] or directly involving a source
> control system.  The most convenient way was to add an additional option,
> -print-to-files, which modified the behavior of -print-after-all,
> -print-before-all, etc.  The filename was constructed by massaging the pass
> name to comply with file system naming conventions and prepending a
> monotonically increasing integer (with suitable leading zeros) plus "bef"
> or "aft" to indicate sequencing.  The only awkward part was modifying
> createPrinterPass to accept a filename, which had to be done because
> otherwise you end up having to keep each stream open from the time you
> setup the pass pipeline until the printing pass actually runs.
>
>
> -Troy
>

That was the exact implementation we had, and that was way too many files
for our file system, we would have to create subfolders each ~100 passes.
Additionally, this took a lot of disk space and the only metadata we could
store was in the file-name. Do you skip passes that don't change the
module? How do you store the missed optimization opportunities messages?

On the other hand, with git, I can store much more in the commit message (I
actually extended the thing to allow a pass to tag a commit, and I am
planning to allow passes to print into the commit message itself).
Yesterday, I wanted to see when the compiler diverge when I tweak SCEV
reduction rules so what I did is run the compiler once, switch the branch
back to the beginning, do a second run with my modification, and the git
history will automatically identify identical commit. That is, I directly
get, in the git history tree, the divergence point between the two versions.

And that's just scrapping the top of the iceberg. Git is designed to be a
version control system, true, but it can also be re-purposed into a
tremendous tool box.

I would seriously encourage going into the "git fast-import" direction, or
a semantically equivalent output format that we post-process, because I
think it would simplify the implementation (especially to allow a pass to
dump anything into the commit message). But don't pass on the actual
benefits of having a version control system backend.

-- 
*Alexandre Isoard*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180615/a5d1b256/attachment.html>