[llvm-dev] RFC: New support for triaging optimization-related failures in front ends

Tue Apr 5 14:30:57 PDT 2016

On Tue, Apr 5, 2016 at 10:38 AM, Kaylor, Andrew via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hi Greg,
>
>
>
> Thanks for your input!
>
>
>
> I’d love to see this evolve into something that would let us track
> problems to individual IR changes.  My thinking is that starting with
> passes gives us something that will get us maybe 70% of the functionality
> in one easy step, then individual passes can be instrumented as anyone has
> time or need to get us the rest of the way there.
>
>
>
> Your max_opt option sounds just like what I’m hoping to achieve here.
> Personally, I’d love to see some open source version of the autochop
> system.
>

Isn't this what llvm/utils/abtest/abtest.py + llvm/utils/bisect do?

-- Sean Silva

>   The more pieces of this we can get into trunk, the less everyone has to
> keep re-inventing solutions to this shared problem.  I think there is a lot
> of opportunity here for growing this functionality through connecting
> scripts and other tools.
>
>
>
> -Andy
>
>
>
>
>
> *From:* Greg Bedwell [mailto:gregbedwell at gmail.com]
> *Sent:* Tuesday, April 05, 2016 7:36 AM
> *To:* Kaylor, Andrew <andrew.kaylor at intel.com>
> *Cc:* llvm-dev at lists.llvm.org; Pete Cooper <peter_cooper at apple.com>
>
> *Subject:* Re: [llvm-dev] RFC: New support for triaging
> optimization-related failures in front ends
>
>
>
> Hi,
>
>
>
> I'm late to the party on this one due to having been off on my holidays
> but I wanted to add an enthusiastic +1 to this idea.  I presented our SNC
> compiler's similar max_opts approach at EuroLLVM 2014 (slide 38 onwards in
> http://llvm.org/devmtg/2014-04/PDFs/Talks/GBedwell_PS4CPUToolchain_EuroLLVM2014_distribution.pdf
> ) so I'm very excited to see this.   In that compiler we had it implemented
> per-transformation so a single difference between max_opts levels would
> often only relate to a single line of IR changing.  The ability to take a
> 1000+ source file game and reduce any difference in behaviour to a single
> line of IR changing somewhere in an automated (or mostly automated) manner
> was absolutely invaluable (in fact, the only disadvantage I can think of is
> that it made triaging optimizer bugs so trivially easy that we stopped
> needing to use our debugger to investigate them and lost out on a lot of
> dogfooding of debug data as a result).
>
>
>
> I think this would also be of tremendous benefit in our triage of LTO
> issues so It'd like to see the option exposed there too.
>
>
>
> We paired up the option with our own 'autochop' harness which would build
> the project we were triaging in two configurations (one with no transforms
> allowed and one with unlimited transforms allowed) which would then bisect
> at link time based on mixing the two sets of object files, until it found
> the first one to make some difference in observable behaviour followed by
> rebuilding that file with different values of max_opts to bisect to the
> first bad transformation (determined either by parsing console output or,
> more often in our case, asking the user 'is what appears on the screen
> correct?').
>
>
>
> Just a thought, but I'm wondering whether once this change lands it would
> be worth also having some version of that autochop tool available in the
> open source.
>
>
>
> Thanks
>
>
>
> -Greg
>
>
>
>
>
>
>
>
>
> On 28 March 2016 at 18:36, Kaylor, Andrew via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> I agree that the more fine grained this becomes the more useful it can be.
>
>
>
> I’ve updated my prototype to use a single number approach.  I’m going to
> clean this up and post a review in the next day or two.
>
>
>
> -Andy
>
>
>
> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *Pete
> Cooper via llvm-dev
> *Sent:* Friday, March 25, 2016 10:22 PM
> *To:* Matthias Braun <matze at braunis.de>
>
>
> *Cc:* llvm-dev at lists.llvm.org
> *Subject:* Re: [llvm-dev] RFC: New support for triaging
> optimization-related failures in front ends
>
>
>
> I've worked on a compiler with a counter, but for individual
> optimisations, not just passes. It was incredibly useful!
>
>
>
> In the llvm world, it would let you bisect exactly which instcombine,
> dagcombine, or whatever causes an issue.
>
>
>
> I support the addition of a pass counter if it helps bisecting, but just
> wanted to point out that this can be as fine grained as the community is
> willing to accept.
>
>
>
> Incidentally, would be great to have some fuzzers running the pass
> bisector to make sure nothing crashes when a pass is added/removed,
> excepting any backend passes required to run.
>
>
>
> Pete
>
> Sent from my iPhone
>
>
> On Mar 25, 2016, at 5:29 PM, Matthias Braun via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Ok I cleaned it up and added some explaining comments. It's in
> llvm/utils/abtest now.
>
>
>
> - Mathias
>
>
>
> On Mar 25, 2016, at 4:40 PM, Michael Gottesman <mgottesman at apple.com>
> wrote:
>
>
>
>
>
> On Mar 25, 2016, at 4:37 PM, Matthias Braun <matze at braunis.de> wrote:
>
>
>
> And as we are on the topic of bisecting/diagnosing scripts I attached my
> personal script I used before.
>
>
>
> You give it two directories with assembly files (typically from a known
> good compiler and a "bad" compiler). The script will then go on and create
> permutations by picking all files from the "good" directory and combining
> them with a single file form the "bad" directory, to see which files make
> it fail. In a 2nd step it can do the same thing by replacing functions in a
> "good" with functions from a "bad" file.
>
> In case of a compiler revisions that triggers a miscompile this is usually
> enough to track down the function that is miscompiled.
>
>
>
> Andys proposed scheme should be striclty more powerfull though as it is
> robust against different optimization decisions between the two
> compilations and even allows you to track down the pass that introduced the
> failure, but maybe my script is useful for somebody else in the meantime.
>
>
>
> Yes these sorts of scripts are useful (I have written the same script 3-4
> times).
>
>
>
> Why not clean it up and commit it under utils, maybe creating a debug
> utility directory and stick it with bisect? (Just a random thought). I
> would love not to have to write it again ; ).
>
>
>
> Michael
>
>
>
>
>
> - Matthias
>
>
>
> <abtest.py>
>
>
>
> On Mar 25, 2016, at 4:05 PM, Michael Gottesman via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
>
> I will describe the complete process for completeness thus hopefully
> forestalling all questions [tell me if I did not ; )]. There is not much to
> it TBH.
>
>
>
> ./utils/bisect is a dumb python script that allows for arbitrary bisecting
> via the exit status of a script it runs . The way you use it is you write a
> script (lets call it test.sh). Then you invoke:
>
>
>
> ./utils/bisect --start=N --end=M ./test.sh "%(count)s"
>
>
>
> the bisect script will just invoke test.sh over and over again
> interpolating "%(count)s" with whatever the current count is. test.sh uses
> the count argument in its internal computations.
>
>
>
> In these test.sh scripts I invoke the swift compiler using an option
> called "-sil-opt-pass-count". The SIL pass manager is aware of this option
> and when the option is set, the pass manager increments a counter for all
> "actions" that it performs. Keep in mind this means ALL actions. You are
> correct that this /does/ make the counter value balloon. But bisecting is
> quick, (think about log2 of UINT64_MAX). When it finishes, the pass manager
> stops running passes. If the action to perform takes a long time, I just
> let it run overnight or do it on an extra cpu or something like that. The
> simplicity is worth it though IMHO.
>
>
>
> Thats all it is.
>
>
>
> Michael
>
>
>
> On Mar 25, 2016, at 1:56 PM, Kaylor, Andrew <andrew.kaylor at intel.com>
> wrote:
>
>
>
> > In the swift-world we use utils/bisect + a single number all the time +
> extra verifications. It works really well.
>
>
>
> Can you describe to me what you mean by that exactly?
>
>
>
> Are you using the single number in the LLVM back end or somewhere else?
>
>
>
>
>
> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org
> <llvm-dev-bounces at lists.llvm.org>] *On Behalf Of *Michael Gottesman via
> llvm-dev
> *Sent:* Friday, March 25, 2016 12:35 PM
> *To:* Adrian Prantl <aprantl at apple.com>
> *Cc:* llvm-dev at lists.llvm.org
> *Subject:* Re: [llvm-dev] RFC: New support for triaging
> optimization-related failures in front ends
>
>
>
>
>
> On Mar 25, 2016, at 12:10 PM, Adrian Prantl via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
>
>
> On Mar 25, 2016, at 11:56 AM, Mehdi Amini via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
>
> Hi Andy,
>
>
>
> On Mar 25, 2016, at 11:41 AM, Kaylor, Andrew via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
>
> The Intel C++ compiler has long had an internal facility to help our
> development teams track down the source of optimization related bugs by
> allowing a sort of binary search in which optimization passes and even
> individual optimizations are selectively disabled in response to front end
> command line options.  As we've been doing development work on LLVM our
> developers have found that we miss this capability and so we would like to
> introduce something like this to the LLVM code base.
>
>
>
> I am aware of the many existing facilities in LLVM for tracking down bugs
> such as llc, opt and bugpoint.  Our development teams are, of course,
> making use of these facilities, but we believe that the feature I am going
> to propose today can powerfully complement these existing tools and provide
> a better way to automate defect finding in products that provide a front
> end for LLVM.  This can also be used by customers (who may not be able to
> share their source code) to narrow down problems and provide a concise
> reproducer.
>
>
>
> While the combination of llc, opt and bugpoint is very effective at
> locating compilation failures that can be reproduced from an LLVM IR or
> bitcode file, they can be cumbersome and difficult to automate,
> particularly when debugging runtime failures in programs with non-trivial
> build systems.  The new feature that I am proposing provides a way to
> selectively turn off ranges of optimizations directly from a set of front
> end command line options.
>
>
>
> The proposed feature works by assigning arbitrary but repeatable values to
> modules, functions, passes, etc. as a compilation unit is being optimized.
> The developer can then use these numbers to selectively disable parts of
> the optimization without otherwise altering the behavior of the compiler.
>
>
>
> In keeping with the current LLVM pass manager structure, I am handling
> module passes, SCC passes and function passes separately.  I propose
> handling loop, region and basic block passes as if they were function
> passes.  (I think what this means will become clear below.)  Because the
> handling of function passes illustrates well all of the steps involved in
> using the feature I am proposing, I'll start by describing that case in
> detail.  I'm describing this as a series of manual steps, but I intend that
> these steps could be easily automated.
>
>
>
> I propose introducing three command line options to support locating
> problems with function pass optimizations.  I envision these as options
> available through the front end but implemented in the core LLVM library.
>
>
>
> -num-fn
>
>   sets an upper limit on the function to optimize
>
>   -1 enables all and displays the numbering
>
>
>
> -num-fn-pass
>
>   sets an upper limit on the function pass to run on the limit function
> specified by -num-fn
>
>   -1 enables all and displays the numbering
>
>   ignored if -num-fn is not used or is -1
>
>   all passes are run on functions below the limit function
>
>   only necessary and analysis passes are run on functions above the limit
> function
>
>
>
> -num-fn-case
>
>   sets an upper limit on the optimization case to apply within the limit
> function pass specified by -num-fn-pass
>
>   -1 enables all and displays the numbering
>
>   ignored if -num-fn-pass is not used or is -1
>
>   all cases are applied in passes below the limit function pass
>
>   no cases are applied in optional passes below the limit function pass
>
>
>
> As an example, a developer searching for function pass related
> optimization problems (assuming a case which fails when compiled with -O2
> but passes when compiled with -O0) would begin by using the '-num-fn=-1'
> option to see the functioning numbering.
>
>
>
> clang -c -O2 -num-fn=-1 test.c
>
>
>
> Optimizing function (1) prune_match
>
> Optimizing function (2) free_S
>
> Optimizing function (3) hash_S
>
> Optimizing function (4) insert_S
>
> Optimizing function (5) zero_S
>
> Optimizing function (6) init_S
>
> Optimizing function (7) matches_S
>
> Optimizing function (8) clean_up
>
>
>
> The developer would then use a binary search, recompiling with selective
> optimization and re-running the test case to determine the function in
> which the problem occurs.
>
>
>
> clang -c -O2 -num-fn=4 test.c
>
> <test passes>
>
> clang -c -O2 -num-fn=6 test.c
>
> <test passes>
>
> clang -c -O2 -num-fn=7 test.c
>
> <test fails>
>
>
>
> Having found the problem function, the developer would use the
> '-num-fn-pass=-1' option to see the numbering of function passes (including
> loop, region and basic block passes) that are run on that function.
>
>
>
> clang -c -O2 -num-fn=7 -num-fn-pass=-1 test.c
>
>
>
> Optimizing function (1) prune_match
>
> Optimizing function (2) free_S
>
> Optimizing function (3) hash_S
>
> Optimizing function (4) insert_S
>
> Optimizing function (5) zero_S
>
> Optimizing function (6) init_S
>
> Optimizing function (7) matches_S
>
> running necessary pass Function Pass Manager on function (7) matches_S
>
> running necessary pass Module Verifier on function (7) matches_S
>
> running necessary pass Add DWARF path discriminators (3) on function (7)
> matches_S
>
> running pass (1) Simplify the CFG on function (7) matches_S
>
> running analysis pass Dominator Tree Construction on function (7) matches_S
>
> running pass (2) SROA on function (7) matches_S
>
> running pass (3) Early CSE on function (7) matches_S
>
> running pass (4) Lower 'expect' Intrinsics on function (7) matches_S
>
> <additional passes ignored for brevity>
>
> NOT Optimizing function (8) clean_up
>
>
>
> The user would again use a binary search to determine the problem pass.
>
>
>
> clang -c -O2 -num-fn=7 -num-fn-pass=2 test.c
>
> <test passes>
>
> clang -c -O2 -num-fn=7 -num-fn-pass=3 test.c
>
> <test fails>
>
>
>
> Having determined the problem pass, the developer would use the
> '-num-fn-case=-1' option to see the numbering of individual optimizations
> applied by the pass.
>
>
>
> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=-1 test.c
>
>
>
> Optimizing function (1) prune_match
>
> Optimizing function (2) free_S
>
> Optimizing function (3) hash_S
>
> Optimizing function (4) insert_S
>
> Optimizing function (5) zero_S
>
> Optimizing function (6) init_S
>
> Optimizing function (7) matches_S
>
> running necessary pass Function Pass Manager on function (7) matches_S
>
> running necessary pass Module Verifier on function (7) matches_S
>
> running necessary pass Add DWARF path discriminators (3) on function (7)
> matches_S
>
> running pass (1) Simplify the CFG on function (7) matches_S
>
> running analysis pass Dominator Tree Construction on function (7) matches_S
>
> running pass (2) SROA on function (7) matches_S
>
> running pass (3) Early CSE on function (7) matches_S
>
> Another case (1): EarlyCSE CSE value
>
> Another case (2): EarlyCSE CSE value
>
> Another case (3): EarlyCSE CSE value
>
> Another case (4): EarlyCSE CSE load
>
> Another case (5): EarlyCSE CSE value
>
> Another case (6): EarlyCSE CSE load
>
> Another case (7): EarlyCSE CSE value
>
> Another case (8): EarlyCSE CSE load
>
> NOT running pass (4) Lower 'expect' Intrinsics on function (7) matches_S
>
> <additional passes ignored for brevity>
>
> NOT Optimizing function (8) clean_up
>
>
>
> Finally, the developer would use one last binary search to determine which
> optimization case was causing the failure.
>
>
>
> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=4 test.c
>
> <test fails>
>
> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=2 test.c
>
> <test passes>
>
> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=3 test.c
>
> <test fails>
>
>
>
> Most of the above functionality can be implemented by inserting hooks into
> the various pass managers.  Support for the '-num-fn-case' option would
> require instrumentation of individual passes.  I propose implementing this
> on an 'opt in' basis so that it can be added to passes as needed and passes
> which do not opt in will simply not report having selectable cases.
>
>
>
> Note that I have introduced the concept of a "necessary" pass.  Because
> the goal of this is to have the front end produce the same sort of object
> or executable file that it normally would, some passes (such as the
> register allocator and its various helper passes) cannot be skipped.  I am
> also, for now, proposing that all analysis passes be run always to avoid
> the problem of determining which analysis passes would be required for
> later "necessary" passes.  We can revisit this as needed.
>
>
>
> As I've said, I intend to handle loop, region and basic block passes as if
> they were just another function pass.  While I would like to display an
> associated number for the target loop, region or block, I don't see value
> in being able to filter passes based on this value (because they are
> relatively dynamic constructs), so consecutive runs of a single loop pass
> (for instance) on different loops would be identified as distinct function
> passes.  For example:
>
>
>
> running pass (25) Loop Invariant Code Motion on loop (1), function (7)
> matches_S
>
> running pass (26) Loop Invariant Code Motion on loop (2), function (7)
> matches_S
>
> running pass (27) Loop Invariant Code Motion on loop (3), function (7)
> matches_S
>
>
>
> I would also like to have a feature that would allow the developer to
> request that an LLVM IR/bitcode file be emitted just before the first
> disabled pass so that the isolated IR could be used with opt, llc or
> bugpoint for further debugging.  I haven't worked out exactly how this
> would work, so I'm just mentioning it here as a fuzzy part of the proposal.
>
>
>
> Circling back now to module and SCC passes, these would work in pretty
> much the same way as the above method for function passes so I won't cover
> them in as much detail.  I propose the following new command line options.
>
>
>
> -num-mod
>
>   sets an upper limit on the module on which to run module passes
>
>   -1 enables all and displays the numbering
>
>   defaults to 1 since I expect most front ends to usually operate on a
> single module
>
>
>
> -num-mod-pass
>
>   sets an upper limit on the module pass to run on the limit module
> specified by -num-mod
>
>   -1 enables all and displays the numbering
>
>   ignored if -num-mod is -1
>
>   all passes are run on modules below the limit module
>
>   only necessary and analysis passes are run on modules above the limit
> module
>
>
>
> -num-mod-case
>
>   sets an upper limit on the optimization case to apply within the limit
> module pass specified by -num-mod-pass
>
>   -1 enables all and displays the numbering
>
>   ignored if -num-mod-pass is not used or is -1
>
>   all cases are applied in passes below the limit module pass
>
>   no cases are applied in optional passes below the limit module pass
>
>
>
> -num-scc-pass
>
>   sets an upper limit on the SCC pass to run
>
>   -1 enables all and displays the numbering
>
>
>
> -num-scc-case
>
>   sets an upper limit on the optimization case to apply within the limit
> SCC pass specified by -num-scc-pass
>
>   -1 enables all and displays the numbering
>
>   ignored if -num-scc-pass is not used or is -1
>
>   all cases are applied in passes below the limit SCC pass
>
>   no cases are applied in optional passes below the limit SCC pass
>
>
>
> Similar to loops, regions and basic blocks, I would like to present
> numbering information for each unique SCC, but I do not believe there is
> value in being able to filter on a specific SCC because they are subject to
> reorganization.  Rather I would treat each invocation of an SCC pass as a
> separate selectable pass.  The output for an SCC problem search would look
> something like this:
>
>
>
> Optimizing SCC (1): <null function>
>
> running pass (1) Remove unused exception handling info on SCC (1)
>
> running pass (2) Function Integration/Inlining on SCC (1)
>
> running pass (3) Deduce function attributes on SCC (1)
>
> Optimizing SCC (2): isupper
>
> running pass (4) Remove unused exception handling info on SCC (2)
>
> running pass (5) Function Integration/Inlining on SCC (2)
>
> running pass (6) Deduce function attributes on SCC (2)
>
> <snip>
>
> Optimizing SCC (34): or_purge_E_list, and_purge_E_list, purge_Exp
>
> running pass (101) Remove unused exception handling info on SCC (34)
>
> running pass (102) Function Integration/Inlining on SCC (34)
>
> running pass (103) Deduce function attributes on SCC (34)
>
> <snip>
>
>
>
> Here I'm printing the functions (if any) associated with nodes in the SCC
> just to give the developer some feedback on which part of the code is being
> worked on.
>
>
>
> I have a working prototype of all of the above functionality using both
> the new pass manager and the legacy pass manager.
>
>
>
> So what do you think?  Do you agree that support for a feature of this
> sort would be useful in the LLVM core library?  What problems do you see in
> my proposal?  What additional options would you like to see added?
>
>
>
> Thanks in advance for any feedback.
>
>
>
>
>
> You're right that we could benefit from support to help "bisecting" where
> a miscompile happens in a more automated way.
>
>
>
> Reading through you (long but nice) description, I thought about something
> simpler: a single number incremented every time a pass in the optimizer is
> ran (as the order shown by -debug-pass=Executions).
>
> That way you run once to get the total number of passes that ran and then,
> a simple bisect script can find quite quickly at which point the miscompile
> is introduced (between 0 and max).
>
> I haven't thought about it much but I'm interested in you view of the
> pros/cons.
>
>
>
> This reminds me a bit of utils/bisect.
>
>
>
> In the swift-world we use utils/bisect + a single number all the time +
> extra verifications. It works really well.
>
>
>
> Michael
>
>
>
>
>
> -- adrian
>
>
>
> --
>
> Mehdi
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160405/23857881/attachment.html>