[PATCH] D18999: [ELF/LTO] Parallel Codegen for LLD

Tue Apr 12 20:13:05 PDT 2016

pcc added a comment.

> Any idea what is slow? Is it just that each of the 1/4 sized chunks is still slow? Is the split not even in compile time?

Parallel LTO codegen is highly beholden to Amdahl's law. With parallel LTO codegen there is by default a large serial optimization phase at the start, which is basically the regular LTO optimization pipeline. The only part we parallelize is the backend. Davide's numbers seem about right if he was not using an `--lto-O` flag to reduce the opt level (which basically turns off most of the LTO-stage optimizers, so it's most useful if you're using something like `-fsanitize=cfi`).

Even at lower LTO opt levels, there's a significant amount of serial time spent splitting, serializing and deserializing the partitions. With debug info enabled, the debug info needs to be duplicated between the partitions as well. The best numbers we saw were a roughly 2x speedup with >4 threads, without debug info and at opt level 1.

What we found was that parallel LTO codegen is most useful when you don't mind throwing CPU and RAM at the problem of saving a moderately sized amount of time linking your program. It doesn't completely solve the scalability problem, which is why I've been working towards getting ThinLTO supported in LLD, as the scalability story is much better there.

http://reviews.llvm.org/D18999