[llvm-dev] Commit module to Git after each Pass

Troy Johnson via llvm-dev llvm-dev at lists.llvm.org
Fri Jun 15 11:13:52 PDT 2018


It's only a huge number of files if you're running over a set of input files and are using the -print-*-all options, which was not my use-case.  Typically the use-case is debugging a problem in a single input file with -print-*-all, where generating a few hundred files is fine, or debugging a specific pass with -print-*= for some set of files, which similarly might generate a few hundred files.  In other words, you usually know which input file is experiencing a problem or you know which pass is causing a problem.  If you don't know either, then, well, you are kind of stuck until you narrow your scope further, but there are other tools to help with that.


I was not skipping any passes.  Storing optimization messages was not of interest.  Storing additional metadata was not of interest.  As I said, -print-to-files only modified where the -print-* options sent their output.  That's it.


I use git, and I like git, but would rather leave separate tools as separate tools.  Printing to files, you are totally free to add them to a git repository if you want, but committing them directly forces others to use git just to see the data.


Given that at least two people have implemented virtually the same thing, it seems like -print-to-files would be generally useful.  Others may not need so many files or have your file system constraint.  Would others find it useful?


-Troy


________________________________
From: Alexandre Isoard <alexandre.isoard at gmail.com>
Sent: Friday, June 15, 2018 12:49 PM
To: Troy Johnson
Cc: llvm-dev
Subject: Re: [llvm-dev] Commit module to Git after each Pass

On Fri, Jun 15, 2018 at 9:52 AM Troy Johnson via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

> FWIW: We could also just have a mode that dumps 1 file per pass. That is enough to make it convenient/easy to run diff between passes. (And if you wanted to you could still
> make a git repository out of it with an external script).
>
> - Matthias


I have done this before and would strongly encourage this approach as opposed to direct emission to std[out|err] or directly involving a source control system.  The most convenient way was to add an additional option, -print-to-files, which modified the behavior of -print-after-all, -print-before-all, etc.  The filename was constructed by massaging the pass name to comply with file system naming conventions and prepending a monotonically increasing integer (with suitable leading zeros) plus "bef" or "aft" to indicate sequencing.  The only awkward part was modifying createPrinterPass to accept a filename, which had to be done because otherwise you end up having to keep each stream open from the time you setup the pass pipeline until the printing pass actually runs.


-Troy

That was the exact implementation we had, and that was way too many files for our file system, we would have to create subfolders each ~100 passes. Additionally, this took a lot of disk space and the only metadata we could store was in the file-name. Do you skip passes that don't change the module? How do you store the missed optimization opportunities messages?

On the other hand, with git, I can store much more in the commit message (I actually extended the thing to allow a pass to tag a commit, and I am planning to allow passes to print into the commit message itself).
Yesterday, I wanted to see when the compiler diverge when I tweak SCEV reduction rules so what I did is run the compiler once, switch the branch back to the beginning, do a second run with my modification, and the git history will automatically identify identical commit. That is, I directly get, in the git history tree, the divergence point between the two versions.

And that's just scrapping the top of the iceberg. Git is designed to be a version control system, true, but it can also be re-purposed into a tremendous tool box.

I would seriously encourage going into the "git fast-import" direction, or a semantically equivalent output format that we post-process, because I think it would simplify the implementation (especially to allow a pass to dump anything into the commit message). But don't pass on the actual benefits of having a version control system backend.

--
Alexandre Isoard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180615/f69504e9/attachment.html>


More information about the llvm-dev mailing list