[llvm-dev] Incremental compilation and recognizing distinct bitcode

Fri Jul 8 16:30:36 PDT 2016

> On Jul 8, 2016, at 2:18 PM, David Jones via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> For my project, the step of using LLVM to optimize and generate machine code for a module is much slower than everything else. I realize a significant performance improvement if I can do "incremental compilation" and avoid invoking the LLVM code generator if the underlying object has not changed.
> 
> My current strategy is as follows: for each module:
> - write bitcode out to "module.bc.new"
> - if "module.bc" exists, then compare (byte-by-byte) with "module.bc.new".  If they match, then skip compilation
> - move "module.bc.new" to "module.bc" (known to be different at this point)
> - generate "module.o" (expensive step)
> 
> However, I am finding that occasionally I will write out different bitcode for the same input, which causes gratuitous recompilation.  If I run llvm-dis on "module.bc" and "module.bc.new" in these cases, the output is identical, as expected.
> 
> Is it expected that the actual bitcode may change from run to run, perhaps as a result of ASLR?

No, for instance it is not expected that clang would generate a different bitcode.
I assume you’re using your own fronted to generate the IR? You may not be deterministic when creating it. 

Diffing the output of "llvm-bcanalyzer -dump” may help.

> 
> Is there a better way for me to check that a Module* structure just built is (not) identical to that from a previous run?

You may check what we do with ThinLTO (lib/LTO/ThinLTOCodeGenerator) to perform incremental LTO, i.e. hashing the module content and checking on disk if it exists. This may or may not be able to be included nicely into your flow better than scripting for instance.

— 
Mehdi