[LLVMdev] [RFC] Setting preserve-bc-use-list-order=true by default

Fri Apr 10 09:12:18 PDT 2015

On Thu, Apr 9, 2015 at 12:37 PM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:

>
> > On 2015-Apr-09, at 11:06, David Blaikie <dblaikie at gmail.com> wrote:
> >
> > Late to the party because I figured other people would chime in, but
> I'll have a go...
> >
> > On Tue, Mar 31, 2015 at 7:10 PM, Duncan P. N. Exon Smith <
> dexonsmith at apple.com> wrote:
> > A while back I finished up some work [1] that Chad started to preserve
> > use-list-order in bitcode [2], hidden behind an "experimental" option
> > called `-preserve-bc-use-list-order`.  I then added a similar
> > `-preserve-ll-use-list-order` option for LLVM assembly [3].
> >
> > [1]: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-July/074604.html
> > [2]: https://llvm.org/bugs/show_bug.cgi?id=5680
> > [3]: https://llvm.org/bugs/show_bug.cgi?id=20515
> >
> > I'd like to move both of these options out of "experimental" mode, and
> > turn `-preserve-bc-use-list-order` on by default.  I've attached a patch
> > that does this.
> >
> > Why?
> > ====
> >
> > Use-list order affects the output of some LLVM passes.  A typical
> > example is a pass that walks through basic block predecessors.  The
> > use-list order is deterministic, but not preserved when serializing the
> > LLVM IR to bitcode or assembly.
> >
> > In rare (but frustrating) cases, this makes it difficult to reproduce a
> > crash or miscompile from an intermediate result.
> >
> > For example, consider running an LTO build and serializing between the
> > LTO optimization pipeline and the LTO codegen pipeline.  On SPEC,
> > serializing to/from bitcode will change the output executable in 33
> > benchmarks.  If you use `-preserve-bc-use-list-order`, all executables
> > match.
> >
> > Why do you need those to match? What's the penalty/problem when these
> don't match?
>
> You need these to match so that running a pass using `opt` or `llc` on
> a dumped bitcode file gets you the same result as running the pass
> directly.  This sort of thing (dumping out temporary results and
> continuing from there) is useful when triaging bugs.  The problem is
> that the bug may not reproduce at all when starting from the dumped
> bitcode.
>
> > Reproducibility for tests/bugs/etc seems important, but if it's any
> worse/better with a particular use list, that's a bug we should fix, right?
> Forcing a specific use list isn't going to fix those bugs, just make sure
> we make the same decision (good or bad) every time.
>
> I'm not sure there's consensus that this is a bug.  It's not optimal,
> but it's not clear to me that it's invalid to depend on use-list order.
> In some cases, it may be a reasonable heuristic that improves compile
> time (I'm not arguing for that, I'm just not convinced otherwise).
>
> > What does it cost?
> > ==================
> >
> > Manman collected a bunch of numbers, with `-preserve-bc-use-list-order`
> > and `-preserve-ll-use-list-order` both turned on:
> >
> >   - Time increase on LTO build time: negligible.
> >   - Filesize increase in bitcode files output by `clang -flto`:
> >       - clang: 6.8556% (sum when from 310412640 to 331693296 bytes).
> >   - Filesize increase in merged bitcode files (output of
> >     `ld -save-temps` when using `clang -flto` on Darwin).
> >       — SPEC: 5.24% to 23.4% increase in file size.
> >       — clang and related tools: 0.83% to 9.19% (details in
> >         filesize_clang.txt, attached).
> >   - Time increase of running `llvm-dis` on merged bitcode files:
> >       — 6.4% to 15% on SPEC benchmarks with running time >= 1 second.
> >       — 7.9% to 14.5% on clang with running time >= 1 second (details in
> >         dis_clang.txt, attached).
> >   - Time increase of running `llvm-as` on merged bitcode files:
> >       — 3.5% to 39% on SPEC benchmarks with running time >= 1 second.
> >       — 14.7% to 24.5% with running time >= 1 second (details in
> >         as_clang.txt, attached).
> >
> > These seem like pretty big costs to pay (bitcode size is going to be
> particularly important to Google - big projects under LTO, limits on the
> total size of the inputs to the link step, etc). To the point above, it's
> not clear why we'd want to pay that cost. (I'm partly playing devil's
> advocate here - I realize reproducibility is really important for a bunch
> of reasons, though this particular reproducibility is a marginal one
> (compared to "run the compiler twice on the same input and get two
> different answers") but it seems like we've generally treated these issues
> as bugs and fixed the optimizations to be use-list-order independent in the
> past, no?)
> >
>
> (FWIW, there's some ancient discussion in PR5680 about this.)
>
> I don't have a strong opinion on whether depending on use-list order
> should be considered a bug.  However, it *is* a bug not to be able to
> roundtrip to bitcode and get the same end result.
>
> While it may be possible to remove the compiler's dependency on use-list
> order, no one has signed up to do the work, it's not clear what the
> compile time cost would be, and there isn't consensus that it's the
> right end goal.
>

I'd say the current solution, while you've signed up/done the work has a
pretty clear and significant cost, and I'm not sure about consensus (I
figured other people might chime in which is why I didn't bother until now
- Chandler mentioned he'd tried & hadn't found many supporters of his
dissenting opinion so I figured I'd have a go).

> In the meantime, this fixes the bug.  If/when that hypothetical work is
> done and validated we can turn this off by default if it makes sense to.
>
> In terms of LTO bitcode size: serializing use-list order isn't actually
> necessary/useful for the "normal" outputs of `clang -flto`.  It's
> important for `clang -emit-llvm`, `clang [-flto] -save-temps`, and
> `<gold/libLTO> -save-temps`, but serialization between "compile" and
> "link" is a deterministic and reproducible step.  A possible
> optimization would be to disable the option when writing `clang`'s
> (final) output file in `clang -flto`.  Thoughts?
>

Sounds plausible - I'd be inclined to go a step further and make this
opt-in for tools/actions rather than opt-out for clang -flto.

I'm assuming -emit-llvm and -save-temps behavior is important to use as
debugging tools - or is there some other reason you're relying on those
features not to introduce variation? If it's a matter of "LLVM debugging
tools should enable use list order preservation to make the lives of LLVM
developers easier" then I think it makes sense for us to opt in all our
debugging tools to do this but leave the default behavior to not incur this
extra cost - so that people using LLVM as a library don't pay this extra
cost in their production pipelines (but, like us, can enable it with a flag
when needed).

- David

>
> Alternatively, some portions of the aforementioned hypothetical work
> might be low-hanging fruit.  E.g., it might not be hard to validate that
> the use-list order of `ConstantInt`s doesn't affect output (and if it
> does, it would probably be easy to fix); at the same time, I imagine
> they account for a disproportionate percentage of the bitcode bloat
> (they have large use-lists that change frequently).
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150410/2e08c426/attachment.html>