[LLVMdev] [RFC] Setting preserve-bc-use-list-order=true by default

Fri Apr 10 12:49:58 PDT 2015

FWIW, I completely agree with everything David has said here. Just so my
silence isn't confusing.

On Fri, Apr 10, 2015 at 9:16 AM David Blaikie <dblaikie at gmail.com> wrote:

> On Thu, Apr 9, 2015 at 12:37 PM, Duncan P. N. Exon Smith <
> dexonsmith at apple.com> wrote:
>
>>
>> > On 2015-Apr-09, at 11:06, David Blaikie <dblaikie at gmail.com> wrote:
>> >
>> > Late to the party because I figured other people would chime in, but
>> I'll have a go...
>> >
>> > On Tue, Mar 31, 2015 at 7:10 PM, Duncan P. N. Exon Smith <
>> dexonsmith at apple.com> wrote:
>> > A while back I finished up some work [1] that Chad started to preserve
>> > use-list-order in bitcode [2], hidden behind an "experimental" option
>> > called `-preserve-bc-use-list-order`.  I then added a similar
>> > `-preserve-ll-use-list-order` option for LLVM assembly [3].
>> >
>> > [1]: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-July/074604.html
>> > [2]: https://llvm.org/bugs/show_bug.cgi?id=5680
>> > [3]: https://llvm.org/bugs/show_bug.cgi?id=20515
>> >
>> > I'd like to move both of these options out of "experimental" mode, and
>> > turn `-preserve-bc-use-list-order` on by default.  I've attached a patch
>> > that does this.
>> >
>> > Why?
>> > ====
>> >
>> > Use-list order affects the output of some LLVM passes.  A typical
>> > example is a pass that walks through basic block predecessors.  The
>> > use-list order is deterministic, but not preserved when serializing the
>> > LLVM IR to bitcode or assembly.
>> >
>> > In rare (but frustrating) cases, this makes it difficult to reproduce a
>> > crash or miscompile from an intermediate result.
>> >
>> > For example, consider running an LTO build and serializing between the
>> > LTO optimization pipeline and the LTO codegen pipeline.  On SPEC,
>> > serializing to/from bitcode will change the output executable in 33
>> > benchmarks.  If you use `-preserve-bc-use-list-order`, all executables
>> > match.
>> >
>> > Why do you need those to match? What's the penalty/problem when these
>> don't match?
>>
>> You need these to match so that running a pass using `opt` or `llc` on
>> a dumped bitcode file gets you the same result as running the pass
>> directly.  This sort of thing (dumping out temporary results and
>> continuing from there) is useful when triaging bugs.  The problem is
>> that the bug may not reproduce at all when starting from the dumped
>> bitcode.
>>
>> > Reproducibility for tests/bugs/etc seems important, but if it's any
>> worse/better with a particular use list, that's a bug we should fix, right?
>> Forcing a specific use list isn't going to fix those bugs, just make sure
>> we make the same decision (good or bad) every time.
>>
>> I'm not sure there's consensus that this is a bug.  It's not optimal,
>> but it's not clear to me that it's invalid to depend on use-list order.
>> In some cases, it may be a reasonable heuristic that improves compile
>> time (I'm not arguing for that, I'm just not convinced otherwise).
>>
>> > What does it cost?
>> > ==================
>> >
>> > Manman collected a bunch of numbers, with `-preserve-bc-use-list-order`
>> > and `-preserve-ll-use-list-order` both turned on:
>> >
>> >   - Time increase on LTO build time: negligible.
>> >   - Filesize increase in bitcode files output by `clang -flto`:
>> >       - clang: 6.8556% (sum when from 310412640 to 331693296 bytes).
>> >   - Filesize increase in merged bitcode files (output of
>> >     `ld -save-temps` when using `clang -flto` on Darwin).
>> >       — SPEC: 5.24% to 23.4% increase in file size.
>> >       — clang and related tools: 0.83% to 9.19% (details in
>> >         filesize_clang.txt, attached).
>> >   - Time increase of running `llvm-dis` on merged bitcode files:
>> >       — 6.4% to 15% on SPEC benchmarks with running time >= 1 second.
>> >       — 7.9% to 14.5% on clang with running time >= 1 second (details in
>> >         dis_clang.txt, attached).
>> >   - Time increase of running `llvm-as` on merged bitcode files:
>> >       — 3.5% to 39% on SPEC benchmarks with running time >= 1 second.
>> >       — 14.7% to 24.5% with running time >= 1 second (details in
>> >         as_clang.txt, attached).
>> >
>> > These seem like pretty big costs to pay (bitcode size is going to be
>> particularly important to Google - big projects under LTO, limits on the
>> total size of the inputs to the link step, etc). To the point above, it's
>> not clear why we'd want to pay that cost. (I'm partly playing devil's
>> advocate here - I realize reproducibility is really important for a bunch
>> of reasons, though this particular reproducibility is a marginal one
>> (compared to "run the compiler twice on the same input and get two
>> different answers") but it seems like we've generally treated these issues
>> as bugs and fixed the optimizations to be use-list-order independent in the
>> past, no?)
>> >
>>
>> (FWIW, there's some ancient discussion in PR5680 about this.)
>>
>> I don't have a strong opinion on whether depending on use-list order
>> should be considered a bug.  However, it *is* a bug not to be able to
>> roundtrip to bitcode and get the same end result.
>>
>> While it may be possible to remove the compiler's dependency on use-list
>> order, no one has signed up to do the work, it's not clear what the
>> compile time cost would be, and there isn't consensus that it's the
>> right end goal.
>>
>
> I'd say the current solution, while you've signed up/done the work has a
> pretty clear and significant cost, and I'm not sure about consensus (I
> figured other people might chime in which is why I didn't bother until now
> - Chandler mentioned he'd tried & hadn't found many supporters of his
> dissenting opinion so I figured I'd have a go).
>
>
>> In the meantime, this fixes the bug.  If/when that hypothetical work is
>> done and validated we can turn this off by default if it makes sense to.
>>
>> In terms of LTO bitcode size: serializing use-list order isn't actually
>> necessary/useful for the "normal" outputs of `clang -flto`.  It's
>> important for `clang -emit-llvm`, `clang [-flto] -save-temps`, and
>> `<gold/libLTO> -save-temps`, but serialization between "compile" and
>> "link" is a deterministic and reproducible step.  A possible
>> optimization would be to disable the option when writing `clang`'s
>> (final) output file in `clang -flto`.  Thoughts?
>>
>
> Sounds plausible - I'd be inclined to go a step further and make this
> opt-in for tools/actions rather than opt-out for clang -flto.
>
> I'm assuming -emit-llvm and -save-temps behavior is important to use as
> debugging tools - or is there some other reason you're relying on those
> features not to introduce variation? If it's a matter of "LLVM debugging
> tools should enable use list order preservation to make the lives of LLVM
> developers easier" then I think it makes sense for us to opt in all our
> debugging tools to do this but leave the default behavior to not incur this
> extra cost - so that people using LLVM as a library don't pay this extra
> cost in their production pipelines (but, like us, can enable it with a flag
> when needed).
>
>
> - David
>
>
>>
>> Alternatively, some portions of the aforementioned hypothetical work
>> might be low-hanging fruit.  E.g., it might not be hard to validate that
>> the use-list order of `ConstantInt`s doesn't affect output (and if it
>> does, it would probably be easy to fix); at the same time, I imagine
>> they account for a disproportionate percentage of the bitcode bloat
>> (they have large use-lists that change frequently).
>>
>>  _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150410/20b6a6d3/attachment.html>