[LLVMdev] [RFC] Setting preserve-bc-use-list-order=true by default
David Blaikie
dblaikie at gmail.com
Fri Apr 10 09:12:18 PDT 2015
On Thu, Apr 9, 2015 at 12:37 PM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:
>
> > On 2015-Apr-09, at 11:06, David Blaikie <dblaikie at gmail.com> wrote:
> >
> > Late to the party because I figured other people would chime in, but
> I'll have a go...
> >
> > On Tue, Mar 31, 2015 at 7:10 PM, Duncan P. N. Exon Smith <
> dexonsmith at apple.com> wrote:
> > A while back I finished up some work [1] that Chad started to preserve
> > use-list-order in bitcode [2], hidden behind an "experimental" option
> > called `-preserve-bc-use-list-order`. I then added a similar
> > `-preserve-ll-use-list-order` option for LLVM assembly [3].
> >
> > [1]: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-July/074604.html
> > [2]: https://llvm.org/bugs/show_bug.cgi?id=5680
> > [3]: https://llvm.org/bugs/show_bug.cgi?id=20515
> >
> > I'd like to move both of these options out of "experimental" mode, and
> > turn `-preserve-bc-use-list-order` on by default. I've attached a patch
> > that does this.
> >
> > Why?
> > ====
> >
> > Use-list order affects the output of some LLVM passes. A typical
> > example is a pass that walks through basic block predecessors. The
> > use-list order is deterministic, but not preserved when serializing the
> > LLVM IR to bitcode or assembly.
> >
> > In rare (but frustrating) cases, this makes it difficult to reproduce a
> > crash or miscompile from an intermediate result.
> >
> > For example, consider running an LTO build and serializing between the
> > LTO optimization pipeline and the LTO codegen pipeline. On SPEC,
> > serializing to/from bitcode will change the output executable in 33
> > benchmarks. If you use `-preserve-bc-use-list-order`, all executables
> > match.
> >
> > Why do you need those to match? What's the penalty/problem when these
> don't match?
>
> You need these to match so that running a pass using `opt` or `llc` on
> a dumped bitcode file gets you the same result as running the pass
> directly. This sort of thing (dumping out temporary results and
> continuing from there) is useful when triaging bugs. The problem is
> that the bug may not reproduce at all when starting from the dumped
> bitcode.
>
> > Reproducibility for tests/bugs/etc seems important, but if it's any
> worse/better with a particular use list, that's a bug we should fix, right?
> Forcing a specific use list isn't going to fix those bugs, just make sure
> we make the same decision (good or bad) every time.
>
> I'm not sure there's consensus that this is a bug. It's not optimal,
> but it's not clear to me that it's invalid to depend on use-list order.
> In some cases, it may be a reasonable heuristic that improves compile
> time (I'm not arguing for that, I'm just not convinced otherwise).
>
> > What does it cost?
> > ==================
> >
> > Manman collected a bunch of numbers, with `-preserve-bc-use-list-order`
> > and `-preserve-ll-use-list-order` both turned on:
> >
> > - Time increase on LTO build time: negligible.
> > - Filesize increase in bitcode files output by `clang -flto`:
> > - clang: 6.8556% (sum when from 310412640 to 331693296 bytes).
> > - Filesize increase in merged bitcode files (output of
> > `ld -save-temps` when using `clang -flto` on Darwin).
> > — SPEC: 5.24% to 23.4% increase in file size.
> > — clang and related tools: 0.83% to 9.19% (details in
> > filesize_clang.txt, attached).
> > - Time increase of running `llvm-dis` on merged bitcode files:
> > — 6.4% to 15% on SPEC benchmarks with running time >= 1 second.
> > — 7.9% to 14.5% on clang with running time >= 1 second (details in
> > dis_clang.txt, attached).
> > - Time increase of running `llvm-as` on merged bitcode files:
> > — 3.5% to 39% on SPEC benchmarks with running time >= 1 second.
> > — 14.7% to 24.5% with running time >= 1 second (details in
> > as_clang.txt, attached).
> >
> > These seem like pretty big costs to pay (bitcode size is going to be
> particularly important to Google - big projects under LTO, limits on the
> total size of the inputs to the link step, etc). To the point above, it's
> not clear why we'd want to pay that cost. (I'm partly playing devil's
> advocate here - I realize reproducibility is really important for a bunch
> of reasons, though this particular reproducibility is a marginal one
> (compared to "run the compiler twice on the same input and get two
> different answers") but it seems like we've generally treated these issues
> as bugs and fixed the optimizations to be use-list-order independent in the
> past, no?)
> >
>
> (FWIW, there's some ancient discussion in PR5680 about this.)
>
> I don't have a strong opinion on whether depending on use-list order
> should be considered a bug. However, it *is* a bug not to be able to
> roundtrip to bitcode and get the same end result.
>
> While it may be possible to remove the compiler's dependency on use-list
> order, no one has signed up to do the work, it's not clear what the
> compile time cost would be, and there isn't consensus that it's the
> right end goal.
>
I'd say the current solution, while you've signed up/done the work has a
pretty clear and significant cost, and I'm not sure about consensus (I
figured other people might chime in which is why I didn't bother until now
- Chandler mentioned he'd tried & hadn't found many supporters of his
dissenting opinion so I figured I'd have a go).
> In the meantime, this fixes the bug. If/when that hypothetical work is
> done and validated we can turn this off by default if it makes sense to.
>
> In terms of LTO bitcode size: serializing use-list order isn't actually
> necessary/useful for the "normal" outputs of `clang -flto`. It's
> important for `clang -emit-llvm`, `clang [-flto] -save-temps`, and
> `<gold/libLTO> -save-temps`, but serialization between "compile" and
> "link" is a deterministic and reproducible step. A possible
> optimization would be to disable the option when writing `clang`'s
> (final) output file in `clang -flto`. Thoughts?
>
Sounds plausible - I'd be inclined to go a step further and make this
opt-in for tools/actions rather than opt-out for clang -flto.
I'm assuming -emit-llvm and -save-temps behavior is important to use as
debugging tools - or is there some other reason you're relying on those
features not to introduce variation? If it's a matter of "LLVM debugging
tools should enable use list order preservation to make the lives of LLVM
developers easier" then I think it makes sense for us to opt in all our
debugging tools to do this but leave the default behavior to not incur this
extra cost - so that people using LLVM as a library don't pay this extra
cost in their production pipelines (but, like us, can enable it with a flag
when needed).
- David
>
> Alternatively, some portions of the aforementioned hypothetical work
> might be low-hanging fruit. E.g., it might not be hard to validate that
> the use-list order of `ConstantInt`s doesn't affect output (and if it
> does, it would probably be easy to fix); at the same time, I imagine
> they account for a disproportionate percentage of the bitcode bloat
> (they have large use-lists that change frequently).
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150410/2e08c426/attachment.html>
More information about the llvm-dev
mailing list