[llvm-dev] RFC: New support for triaging optimization-related failures in front ends
Kaylor, Andrew via llvm-dev
llvm-dev at lists.llvm.org
Fri Mar 25 13:23:58 PDT 2016
That definitely looks like a very similar idea.
-----Original Message-----
From: Sanjoy Das [mailto:sanjoy at playingwithpointers.com]
Sent: Friday, March 25, 2016 12:40 PM
To: Mehdi Amini <mehdi.amini at apple.com>
Cc: Kaylor, Andrew <andrew.kaylor at intel.com>; llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] RFC: New support for triaging optimization-related failures in front ends
Maybe someone on the list who knows about GHC internals (I don't) can chime in here, but GHC has (had?) a concept of "optimization fuel":
http://blog.ezyang.com/2011/06/debugging-compilers-with-optimization-fuel/
that is very similar to what Mehdi is suggesting.
-- Sanjoy
On Fri, Mar 25, 2016 at 12:37 PM, Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
> On Mar 25, 2016, at 12:13 PM, Kaylor, Andrew <andrew.kaylor at intel.com>
> wrote:
>
> Hi Mehdi,
>
> I started by trying to implement a single number approach, but it
> doesn’t allow quite the approach I wanted in terms of isolating
> individual functions.
>
>
> Do you have an example where it matters? I wonder if there are enough
> real-world use case that justify the complexity?
>
>
> Also, I think that there would be problems with that once parallel
> function optimization is enabled.
>
>
> Honestly this is not going to happen in years. And even if we had it,
> we're talking about debugging and we could turn off the parallelism for that.
>
> --
> Mehdi
>
>
>
> I am open to revisiting that approach to see if the problems I had
> with it can be resolved.
>
> -Andy
>
> From: mehdi.amini at apple.com [mailto:mehdi.amini at apple.com]
> Sent: Friday, March 25, 2016 11:57 AM
> To: Kaylor, Andrew <andrew.kaylor at intel.com>
> Cc: llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] RFC: New support for triaging
> optimization-related failures in front ends
>
> Hi Andy,
>
>
> On Mar 25, 2016, at 11:41 AM, Kaylor, Andrew via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
> The Intel C++ compiler has long had an internal facility to help our
> development teams track down the source of optimization related bugs
> by allowing a sort of binary search in which optimization passes and
> even individual optimizations are selectively disabled in response to
> front end command line options. As we've been doing development work
> on LLVM our developers have found that we miss this capability and so
> we would like to introduce something like this to the LLVM code base.
>
> I am aware of the many existing facilities in LLVM for tracking down
> bugs such as llc, opt and bugpoint. Our development teams are, of
> course, making use of these facilities, but we believe that the
> feature I am going to propose today can powerfully complement these
> existing tools and provide a better way to automate defect finding in
> products that provide a front end for LLVM. This can also be used by
> customers (who may not be able to share their source code) to narrow down problems and provide a concise reproducer.
>
> While the combination of llc, opt and bugpoint is very effective at
> locating compilation failures that can be reproduced from an LLVM IR
> or bitcode file, they can be cumbersome and difficult to automate,
> particularly when debugging runtime failures in programs with
> non-trivial build systems. The new feature that I am proposing
> provides a way to selectively turn off ranges of optimizations
> directly from a set of front end command line options.
>
> The proposed feature works by assigning arbitrary but repeatable
> values to modules, functions, passes, etc. as a compilation unit is being optimized.
> The developer can then use these numbers to selectively disable parts
> of the optimization without otherwise altering the behavior of the compiler.
>
> In keeping with the current LLVM pass manager structure, I am handling
> module passes, SCC passes and function passes separately. I propose
> handling loop, region and basic block passes as if they were function
> passes. (I think what this means will become clear below.) Because
> the handling of function passes illustrates well all of the steps
> involved in using the feature I am proposing, I'll start by describing
> that case in detail. I'm describing this as a series of manual steps,
> but I intend that these steps could be easily automated.
>
> I propose introducing three command line options to support locating
> problems with function pass optimizations. I envision these as
> options available through the front end but implemented in the core LLVM library.
>
> -num-fn
> sets an upper limit on the function to optimize
> -1 enables all and displays the numbering
>
> -num-fn-pass
> sets an upper limit on the function pass to run on the limit
> function specified by -num-fn
> -1 enables all and displays the numbering
> ignored if -num-fn is not used or is -1
> all passes are run on functions below the limit function
> only necessary and analysis passes are run on functions above the
> limit function
>
> -num-fn-case
> sets an upper limit on the optimization case to apply within the
> limit function pass specified by -num-fn-pass
> -1 enables all and displays the numbering
> ignored if -num-fn-pass is not used or is -1
> all cases are applied in passes below the limit function pass
> no cases are applied in optional passes below the limit function
> pass
>
> As an example, a developer searching for function pass related
> optimization problems (assuming a case which fails when compiled with
> -O2 but passes when compiled with -O0) would begin by using the
> '-num-fn=-1' option to see the functioning numbering.
>
> clang -c -O2 -num-fn=-1 test.c
>
> Optimizing function (1) prune_match
> Optimizing function (2) free_S
> Optimizing function (3) hash_S
> Optimizing function (4) insert_S
> Optimizing function (5) zero_S
> Optimizing function (6) init_S
> Optimizing function (7) matches_S
> Optimizing function (8) clean_up
>
> The developer would then use a binary search, recompiling with
> selective optimization and re-running the test case to determine the
> function in which the problem occurs.
>
> clang -c -O2 -num-fn=4 test.c
> <test passes>
> clang -c -O2 -num-fn=6 test.c
> <test passes>
> clang -c -O2 -num-fn=7 test.c
> <test fails>
>
> Having found the problem function, the developer would use the
> '-num-fn-pass=-1' option to see the numbering of function passes
> (including loop, region and basic block passes) that are run on that function.
>
> clang -c -O2 -num-fn=7 -num-fn-pass=-1 test.c
>
> Optimizing function (1) prune_match
> Optimizing function (2) free_S
> Optimizing function (3) hash_S
> Optimizing function (4) insert_S
> Optimizing function (5) zero_S
> Optimizing function (6) init_S
> Optimizing function (7) matches_S
> running necessary pass Function Pass Manager on function (7) matches_S
> running necessary pass Module Verifier on function (7) matches_S
> running necessary pass Add DWARF path discriminators (3) on function
> (7) matches_S running pass (1) Simplify the CFG on function (7)
> matches_S running analysis pass Dominator Tree Construction on
> function (7) matches_S running pass (2) SROA on function (7) matches_S
> running pass (3) Early CSE on function (7) matches_S running pass (4)
> Lower 'expect' Intrinsics on function (7) matches_S <additional passes
> ignored for brevity> NOT Optimizing function (8) clean_up
>
> The user would again use a binary search to determine the problem pass.
>
> clang -c -O2 -num-fn=7 -num-fn-pass=2 test.c <test passes> clang -c
> -O2 -num-fn=7 -num-fn-pass=3 test.c <test fails>
>
> Having determined the problem pass, the developer would use the
> '-num-fn-case=-1' option to see the numbering of individual
> optimizations applied by the pass.
>
> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=-1 test.c
>
> Optimizing function (1) prune_match
> Optimizing function (2) free_S
> Optimizing function (3) hash_S
> Optimizing function (4) insert_S
> Optimizing function (5) zero_S
> Optimizing function (6) init_S
> Optimizing function (7) matches_S
> running necessary pass Function Pass Manager on function (7) matches_S
> running necessary pass Module Verifier on function (7) matches_S
> running necessary pass Add DWARF path discriminators (3) on function
> (7) matches_S running pass (1) Simplify the CFG on function (7)
> matches_S running analysis pass Dominator Tree Construction on
> function (7) matches_S running pass (2) SROA on function (7) matches_S
> running pass (3) Early CSE on function (7) matches_S Another case (1):
> EarlyCSE CSE value Another case (2): EarlyCSE CSE value Another case
> (3): EarlyCSE CSE value Another case (4): EarlyCSE CSE load Another
> case (5): EarlyCSE CSE value Another case (6): EarlyCSE CSE load
> Another case (7): EarlyCSE CSE value Another case (8): EarlyCSE CSE
> load NOT running pass (4) Lower 'expect' Intrinsics on function (7)
> matches_S <additional passes ignored for brevity> NOT Optimizing
> function (8) clean_up
>
> Finally, the developer would use one last binary search to determine
> which optimization case was causing the failure.
>
> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=4 test.c <test
> fails> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=2 test.c
> <test passes> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=3
> test.c <test fails>
>
> Most of the above functionality can be implemented by inserting hooks
> into the various pass managers. Support for the '-num-fn-case' option
> would require instrumentation of individual passes. I propose
> implementing this on an 'opt in' basis so that it can be added to
> passes as needed and passes which do not opt in will simply not report having selectable cases.
>
> Note that I have introduced the concept of a "necessary" pass.
> Because the goal of this is to have the front end produce the same
> sort of object or executable file that it normally would, some passes
> (such as the register allocator and its various helper passes) cannot
> be skipped. I am also, for now, proposing that all analysis passes be
> run always to avoid the problem of determining which analysis passes would be required for later "necessary"
> passes. We can revisit this as needed.
>
> As I've said, I intend to handle loop, region and basic block passes
> as if they were just another function pass. While I would like to
> display an associated number for the target loop, region or block, I
> don't see value in being able to filter passes based on this value
> (because they are relatively dynamic constructs), so consecutive runs
> of a single loop pass (for
> instance) on different loops would be identified as distinct function
> passes. For example:
>
> running pass (25) Loop Invariant Code Motion on loop (1), function (7)
> matches_S running pass (26) Loop Invariant Code Motion on loop (2),
> function (7) matches_S running pass (27) Loop Invariant Code Motion on
> loop (3), function (7) matches_S
>
> I would also like to have a feature that would allow the developer to
> request that an LLVM IR/bitcode file be emitted just before the first
> disabled pass so that the isolated IR could be used with opt, llc or
> bugpoint for further debugging. I haven't worked out exactly how this
> would work, so I'm just mentioning it here as a fuzzy part of the proposal.
>
> Circling back now to module and SCC passes, these would work in pretty
> much the same way as the above method for function passes so I won't
> cover them in as much detail. I propose the following new command line options.
>
> -num-mod
> sets an upper limit on the module on which to run module passes
> -1 enables all and displays the numbering
> defaults to 1 since I expect most front ends to usually operate on a
> single module
>
> -num-mod-pass
> sets an upper limit on the module pass to run on the limit module
> specified by -num-mod
> -1 enables all and displays the numbering
> ignored if -num-mod is -1
> all passes are run on modules below the limit module
> only necessary and analysis passes are run on modules above the
> limit module
>
> -num-mod-case
> sets an upper limit on the optimization case to apply within the
> limit module pass specified by -num-mod-pass
> -1 enables all and displays the numbering
> ignored if -num-mod-pass is not used or is -1
> all cases are applied in passes below the limit module pass
> no cases are applied in optional passes below the limit module pass
>
> -num-scc-pass
> sets an upper limit on the SCC pass to run
> -1 enables all and displays the numbering
>
> -num-scc-case
> sets an upper limit on the optimization case to apply within the
> limit SCC pass specified by -num-scc-pass
> -1 enables all and displays the numbering
> ignored if -num-scc-pass is not used or is -1
> all cases are applied in passes below the limit SCC pass
> no cases are applied in optional passes below the limit SCC pass
>
> Similar to loops, regions and basic blocks, I would like to present
> numbering information for each unique SCC, but I do not believe there
> is value in being able to filter on a specific SCC because they are
> subject to reorganization. Rather I would treat each invocation of an
> SCC pass as a separate selectable pass. The output for an SCC problem
> search would look something like this:
>
> Optimizing SCC (1): <null function>
> running pass (1) Remove unused exception handling info on SCC (1)
> running pass (2) Function Integration/Inlining on SCC (1) running pass
> (3) Deduce function attributes on SCC (1) Optimizing SCC (2): isupper
> running pass (4) Remove unused exception handling info on SCC (2)
> running pass (5) Function Integration/Inlining on SCC (2) running pass
> (6) Deduce function attributes on SCC (2) <snip> Optimizing SCC (34):
> or_purge_E_list, and_purge_E_list, purge_Exp running pass (101) Remove
> unused exception handling info on SCC (34) running pass (102) Function
> Integration/Inlining on SCC (34) running pass (103) Deduce function
> attributes on SCC (34) <snip>
>
> Here I'm printing the functions (if any) associated with nodes in the
> SCC just to give the developer some feedback on which part of the code
> is being worked on.
>
> I have a working prototype of all of the above functionality using
> both the new pass manager and the legacy pass manager.
>
> So what do you think? Do you agree that support for a feature of this
> sort would be useful in the LLVM core library? What problems do you
> see in my proposal? What additional options would you like to see added?
>
> Thanks in advance for any feedback.
>
>
>
> You're right that we could benefit from support to help "bisecting"
> where a miscompile happens in a more automated way.
>
> Reading through you (long but nice) description, I thought about
> something
> simpler: a single number incremented every time a pass in the
> optimizer is ran (as the order shown by -debug-pass=Executions).
> That way you run once to get the total number of passes that ran and
> then, a simple bisect script can find quite quickly at which point the
> miscompile is introduced (between 0 and max).
> I haven't thought about it much but I'm interested in you view of the
> pros/cons.
>
> --
> Mehdi
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
--
Sanjoy Das
http://playingwithpointers.com
More information about the llvm-dev
mailing list