[llvm-dev] RFC: New support for triaging optimization-related failures in front ends

Fri Mar 25 12:39:45 PDT 2016

Maybe someone on the list who knows about GHC internals (I don't) can
chime in here, but GHC has (had?) a concept of "optimization fuel":
http://blog.ezyang.com/2011/06/debugging-compilers-with-optimization-fuel/
that is very similar to what Mehdi is suggesting.

-- Sanjoy

On Fri, Mar 25, 2016 at 12:37 PM, Mehdi Amini via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>
> On Mar 25, 2016, at 12:13 PM, Kaylor, Andrew <andrew.kaylor at intel.com>
> wrote:
>
> Hi Mehdi,
>
> I started by trying to implement a single number approach, but it doesn’t
> allow quite the approach I wanted in terms of isolating individual
> functions.
>
>
> Do you have an example where it matters? I wonder if there are enough
> real-world use case that justify the complexity?
>
>
> Also, I think that there would be problems with that once parallel function
> optimization is enabled.
>
>
> Honestly this is not going to happen in years. And even if we had it, we're
> talking about debugging and we could turn off the parallelism for that.
>
> --
> Mehdi
>
>
>
> I am open to revisiting that approach to see if the problems I had with it
> can be resolved.
>
> -Andy
>
> From: mehdi.amini at apple.com [mailto:mehdi.amini at apple.com]
> Sent: Friday, March 25, 2016 11:57 AM
> To: Kaylor, Andrew <andrew.kaylor at intel.com>
> Cc: llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] RFC: New support for triaging optimization-related
> failures in front ends
>
> Hi Andy,
>
>
> On Mar 25, 2016, at 11:41 AM, Kaylor, Andrew via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
> The Intel C++ compiler has long had an internal facility to help our
> development teams track down the source of optimization related bugs by
> allowing a sort of binary search in which optimization passes and even
> individual optimizations are selectively disabled in response to front end
> command line options.  As we've been doing development work on LLVM our
> developers have found that we miss this capability and so we would like to
> introduce something like this to the LLVM code base.
>
> I am aware of the many existing facilities in LLVM for tracking down bugs
> such as llc, opt and bugpoint.  Our development teams are, of course, making
> use of these facilities, but we believe that the feature I am going to
> propose today can powerfully complement these existing tools and provide a
> better way to automate defect finding in products that provide a front end
> for LLVM.  This can also be used by customers (who may not be able to share
> their source code) to narrow down problems and provide a concise reproducer.
>
> While the combination of llc, opt and bugpoint is very effective at locating
> compilation failures that can be reproduced from an LLVM IR or bitcode file,
> they can be cumbersome and difficult to automate, particularly when
> debugging runtime failures in programs with non-trivial build systems.  The
> new feature that I am proposing provides a way to selectively turn off
> ranges of optimizations directly from a set of front end command line
> options.
>
> The proposed feature works by assigning arbitrary but repeatable values to
> modules, functions, passes, etc. as a compilation unit is being optimized.
> The developer can then use these numbers to selectively disable parts of the
> optimization without otherwise altering the behavior of the compiler.
>
> In keeping with the current LLVM pass manager structure, I am handling
> module passes, SCC passes and function passes separately.  I propose
> handling loop, region and basic block passes as if they were function
> passes.  (I think what this means will become clear below.)  Because the
> handling of function passes illustrates well all of the steps involved in
> using the feature I am proposing, I'll start by describing that case in
> detail.  I'm describing this as a series of manual steps, but I intend that
> these steps could be easily automated.
>
> I propose introducing three command line options to support locating
> problems with function pass optimizations.  I envision these as options
> available through the front end but implemented in the core LLVM library.
>
> -num-fn
>   sets an upper limit on the function to optimize
>   -1 enables all and displays the numbering
>
> -num-fn-pass
>   sets an upper limit on the function pass to run on the limit function
> specified by -num-fn
>   -1 enables all and displays the numbering
>   ignored if -num-fn is not used or is -1
>   all passes are run on functions below the limit function
>   only necessary and analysis passes are run on functions above the limit
> function
>
> -num-fn-case
>   sets an upper limit on the optimization case to apply within the limit
> function pass specified by -num-fn-pass
>   -1 enables all and displays the numbering
>   ignored if -num-fn-pass is not used or is -1
>   all cases are applied in passes below the limit function pass
>   no cases are applied in optional passes below the limit function pass
>
> As an example, a developer searching for function pass related optimization
> problems (assuming a case which fails when compiled with -O2 but passes when
> compiled with -O0) would begin by using the '-num-fn=-1' option to see the
> functioning numbering.
>
> clang -c -O2 -num-fn=-1 test.c
>
> Optimizing function (1) prune_match
> Optimizing function (2) free_S
> Optimizing function (3) hash_S
> Optimizing function (4) insert_S
> Optimizing function (5) zero_S
> Optimizing function (6) init_S
> Optimizing function (7) matches_S
> Optimizing function (8) clean_up
>
> The developer would then use a binary search, recompiling with selective
> optimization and re-running the test case to determine the function in which
> the problem occurs.
>
> clang -c -O2 -num-fn=4 test.c
> <test passes>
> clang -c -O2 -num-fn=6 test.c
> <test passes>
> clang -c -O2 -num-fn=7 test.c
> <test fails>
>
> Having found the problem function, the developer would use the
> '-num-fn-pass=-1' option to see the numbering of function passes (including
> loop, region and basic block passes) that are run on that function.
>
> clang -c -O2 -num-fn=7 -num-fn-pass=-1 test.c
>
> Optimizing function (1) prune_match
> Optimizing function (2) free_S
> Optimizing function (3) hash_S
> Optimizing function (4) insert_S
> Optimizing function (5) zero_S
> Optimizing function (6) init_S
> Optimizing function (7) matches_S
> running necessary pass Function Pass Manager on function (7) matches_S
> running necessary pass Module Verifier on function (7) matches_S
> running necessary pass Add DWARF path discriminators (3) on function (7)
> matches_S
> running pass (1) Simplify the CFG on function (7) matches_S
> running analysis pass Dominator Tree Construction on function (7) matches_S
> running pass (2) SROA on function (7) matches_S
> running pass (3) Early CSE on function (7) matches_S
> running pass (4) Lower 'expect' Intrinsics on function (7) matches_S
> <additional passes ignored for brevity>
> NOT Optimizing function (8) clean_up
>
> The user would again use a binary search to determine the problem pass.
>
> clang -c -O2 -num-fn=7 -num-fn-pass=2 test.c
> <test passes>
> clang -c -O2 -num-fn=7 -num-fn-pass=3 test.c
> <test fails>
>
> Having determined the problem pass, the developer would use the
> '-num-fn-case=-1' option to see the numbering of individual optimizations
> applied by the pass.
>
> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=-1 test.c
>
> Optimizing function (1) prune_match
> Optimizing function (2) free_S
> Optimizing function (3) hash_S
> Optimizing function (4) insert_S
> Optimizing function (5) zero_S
> Optimizing function (6) init_S
> Optimizing function (7) matches_S
> running necessary pass Function Pass Manager on function (7) matches_S
> running necessary pass Module Verifier on function (7) matches_S
> running necessary pass Add DWARF path discriminators (3) on function (7)
> matches_S
> running pass (1) Simplify the CFG on function (7) matches_S
> running analysis pass Dominator Tree Construction on function (7) matches_S
> running pass (2) SROA on function (7) matches_S
> running pass (3) Early CSE on function (7) matches_S
> Another case (1): EarlyCSE CSE value
> Another case (2): EarlyCSE CSE value
> Another case (3): EarlyCSE CSE value
> Another case (4): EarlyCSE CSE load
> Another case (5): EarlyCSE CSE value
> Another case (6): EarlyCSE CSE load
> Another case (7): EarlyCSE CSE value
> Another case (8): EarlyCSE CSE load
> NOT running pass (4) Lower 'expect' Intrinsics on function (7) matches_S
> <additional passes ignored for brevity>
> NOT Optimizing function (8) clean_up
>
> Finally, the developer would use one last binary search to determine which
> optimization case was causing the failure.
>
> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=4 test.c
> <test fails>
> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=2 test.c
> <test passes>
> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=3 test.c
> <test fails>
>
> Most of the above functionality can be implemented by inserting hooks into
> the various pass managers.  Support for the '-num-fn-case' option would
> require instrumentation of individual passes.  I propose implementing this
> on an 'opt in' basis so that it can be added to passes as needed and passes
> which do not opt in will simply not report having selectable cases.
>
> Note that I have introduced the concept of a "necessary" pass.  Because the
> goal of this is to have the front end produce the same sort of object or
> executable file that it normally would, some passes (such as the register
> allocator and its various helper passes) cannot be skipped.  I am also, for
> now, proposing that all analysis passes be run always to avoid the problem
> of determining which analysis passes would be required for later "necessary"
> passes.  We can revisit this as needed.
>
> As I've said, I intend to handle loop, region and basic block passes as if
> they were just another function pass.  While I would like to display an
> associated number for the target loop, region or block, I don't see value in
> being able to filter passes based on this value (because they are relatively
> dynamic constructs), so consecutive runs of a single loop pass (for
> instance) on different loops would be identified as distinct function
> passes.  For example:
>
> running pass (25) Loop Invariant Code Motion on loop (1), function (7)
> matches_S
> running pass (26) Loop Invariant Code Motion on loop (2), function (7)
> matches_S
> running pass (27) Loop Invariant Code Motion on loop (3), function (7)
> matches_S
>
> I would also like to have a feature that would allow the developer to
> request that an LLVM IR/bitcode file be emitted just before the first
> disabled pass so that the isolated IR could be used with opt, llc or
> bugpoint for further debugging.  I haven't worked out exactly how this would
> work, so I'm just mentioning it here as a fuzzy part of the proposal.
>
> Circling back now to module and SCC passes, these would work in pretty much
> the same way as the above method for function passes so I won't cover them
> in as much detail.  I propose the following new command line options.
>
> -num-mod
>   sets an upper limit on the module on which to run module passes
>   -1 enables all and displays the numbering
>   defaults to 1 since I expect most front ends to usually operate on a
> single module
>
> -num-mod-pass
>   sets an upper limit on the module pass to run on the limit module
> specified by -num-mod
>   -1 enables all and displays the numbering
>   ignored if -num-mod is -1
>   all passes are run on modules below the limit module
>   only necessary and analysis passes are run on modules above the limit
> module
>
> -num-mod-case
>   sets an upper limit on the optimization case to apply within the limit
> module pass specified by -num-mod-pass
>   -1 enables all and displays the numbering
>   ignored if -num-mod-pass is not used or is -1
>   all cases are applied in passes below the limit module pass
>   no cases are applied in optional passes below the limit module pass
>
> -num-scc-pass
>   sets an upper limit on the SCC pass to run
>   -1 enables all and displays the numbering
>
> -num-scc-case
>   sets an upper limit on the optimization case to apply within the limit SCC
> pass specified by -num-scc-pass
>   -1 enables all and displays the numbering
>   ignored if -num-scc-pass is not used or is -1
>   all cases are applied in passes below the limit SCC pass
>   no cases are applied in optional passes below the limit SCC pass
>
> Similar to loops, regions and basic blocks, I would like to present
> numbering information for each unique SCC, but I do not believe there is
> value in being able to filter on a specific SCC because they are subject to
> reorganization.  Rather I would treat each invocation of an SCC pass as a
> separate selectable pass.  The output for an SCC problem search would look
> something like this:
>
> Optimizing SCC (1): <null function>
> running pass (1) Remove unused exception handling info on SCC (1)
> running pass (2) Function Integration/Inlining on SCC (1)
> running pass (3) Deduce function attributes on SCC (1)
> Optimizing SCC (2): isupper
> running pass (4) Remove unused exception handling info on SCC (2)
> running pass (5) Function Integration/Inlining on SCC (2)
> running pass (6) Deduce function attributes on SCC (2)
> <snip>
> Optimizing SCC (34): or_purge_E_list, and_purge_E_list, purge_Exp
> running pass (101) Remove unused exception handling info on SCC (34)
> running pass (102) Function Integration/Inlining on SCC (34)
> running pass (103) Deduce function attributes on SCC (34)
> <snip>
>
> Here I'm printing the functions (if any) associated with nodes in the SCC
> just to give the developer some feedback on which part of the code is being
> worked on.
>
> I have a working prototype of all of the above functionality using both the
> new pass manager and the legacy pass manager.
>
> So what do you think?  Do you agree that support for a feature of this sort
> would be useful in the LLVM core library?  What problems do you see in my
> proposal?  What additional options would you like to see added?
>
> Thanks in advance for any feedback.
>
>
>
> You're right that we could benefit from support to help "bisecting" where a
> miscompile happens in a more automated way.
>
> Reading through you (long but nice) description, I thought about something
> simpler: a single number incremented every time a pass in the optimizer is
> ran (as the order shown by -debug-pass=Executions).
> That way you run once to get the total number of passes that ran and then, a
> simple bisect script can find quite quickly at which point the miscompile is
> introduced (between 0 and max).
> I haven't thought about it much but I'm interested in you view of the
> pros/cons.
>
> --
> Mehdi
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

-- 
Sanjoy Das
http://playingwithpointers.com