[llvm-dev] RFC: New support for triaging optimization-related failures in front ends

Fri Mar 25 13:23:58 PDT 2016

That definitely looks like a very similar idea.

-----Original Message-----
From: Sanjoy Das [mailto:sanjoy at playingwithpointers.com] 
Sent: Friday, March 25, 2016 12:40 PM
To: Mehdi Amini <mehdi.amini at apple.com>
Cc: Kaylor, Andrew <andrew.kaylor at intel.com>; llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] RFC: New support for triaging optimization-related failures in front ends

Maybe someone on the list who knows about GHC internals (I don't) can chime in here, but GHC has (had?) a concept of "optimization fuel":
http://blog.ezyang.com/2011/06/debugging-compilers-with-optimization-fuel/
that is very similar to what Mehdi is suggesting.

-- Sanjoy

On Fri, Mar 25, 2016 at 12:37 PM, Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
> On Mar 25, 2016, at 12:13 PM, Kaylor, Andrew <andrew.kaylor at intel.com>
> wrote:
>
> Hi Mehdi,
>
> I started by trying to implement a single number approach, but it 
> doesn’t allow quite the approach I wanted in terms of isolating 
> individual functions.
>
>
> Do you have an example where it matters? I wonder if there are enough 
> real-world use case that justify the complexity?
>
>
> Also, I think that there would be problems with that once parallel 
> function optimization is enabled.
>
>
> Honestly this is not going to happen in years. And even if we had it, 
> we're talking about debugging and we could turn off the parallelism for that.
>
> --
> Mehdi
>
>
>
> I am open to revisiting that approach to see if the problems I had 
> with it can be resolved.
>
> -Andy
>
> From: mehdi.amini at apple.com [mailto:mehdi.amini at apple.com]
> Sent: Friday, March 25, 2016 11:57 AM
> To: Kaylor, Andrew <andrew.kaylor at intel.com>
> Cc: llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] RFC: New support for triaging 
> optimization-related failures in front ends
>
> Hi Andy,
>
>
> On Mar 25, 2016, at 11:41 AM, Kaylor, Andrew via llvm-dev 
> <llvm-dev at lists.llvm.org> wrote:
>
> The Intel C++ compiler has long had an internal facility to help our 
> development teams track down the source of optimization related bugs 
> by allowing a sort of binary search in which optimization passes and 
> even individual optimizations are selectively disabled in response to 
> front end command line options.  As we've been doing development work 
> on LLVM our developers have found that we miss this capability and so 
> we would like to introduce something like this to the LLVM code base.
>
> I am aware of the many existing facilities in LLVM for tracking down 
> bugs such as llc, opt and bugpoint.  Our development teams are, of 
> course, making use of these facilities, but we believe that the 
> feature I am going to propose today can powerfully complement these 
> existing tools and provide a better way to automate defect finding in 
> products that provide a front end for LLVM.  This can also be used by 
> customers (who may not be able to share their source code) to narrow down problems and provide a concise reproducer.
>
> While the combination of llc, opt and bugpoint is very effective at 
> locating compilation failures that can be reproduced from an LLVM IR 
> or bitcode file, they can be cumbersome and difficult to automate, 
> particularly when debugging runtime failures in programs with 
> non-trivial build systems.  The new feature that I am proposing 
> provides a way to selectively turn off ranges of optimizations 
> directly from a set of front end command line options.
>
> The proposed feature works by assigning arbitrary but repeatable 
> values to modules, functions, passes, etc. as a compilation unit is being optimized.
> The developer can then use these numbers to selectively disable parts 
> of the optimization without otherwise altering the behavior of the compiler.
>
> In keeping with the current LLVM pass manager structure, I am handling 
> module passes, SCC passes and function passes separately.  I propose 
> handling loop, region and basic block passes as if they were function 
> passes.  (I think what this means will become clear below.)  Because 
> the handling of function passes illustrates well all of the steps 
> involved in using the feature I am proposing, I'll start by describing 
> that case in detail.  I'm describing this as a series of manual steps, 
> but I intend that these steps could be easily automated.
>
> I propose introducing three command line options to support locating 
> problems with function pass optimizations.  I envision these as 
> options available through the front end but implemented in the core LLVM library.
>
> -num-fn
>   sets an upper limit on the function to optimize
>   -1 enables all and displays the numbering
>
> -num-fn-pass
>   sets an upper limit on the function pass to run on the limit 
> function specified by -num-fn
>   -1 enables all and displays the numbering
>   ignored if -num-fn is not used or is -1
>   all passes are run on functions below the limit function
>   only necessary and analysis passes are run on functions above the 
> limit function
>
> -num-fn-case
>   sets an upper limit on the optimization case to apply within the 
> limit function pass specified by -num-fn-pass
>   -1 enables all and displays the numbering
>   ignored if -num-fn-pass is not used or is -1
>   all cases are applied in passes below the limit function pass
>   no cases are applied in optional passes below the limit function 
> pass
>
> As an example, a developer searching for function pass related 
> optimization problems (assuming a case which fails when compiled with 
> -O2 but passes when compiled with -O0) would begin by using the 
> '-num-fn=-1' option to see the functioning numbering.
>
> clang -c -O2 -num-fn=-1 test.c
>
> Optimizing function (1) prune_match
> Optimizing function (2) free_S
> Optimizing function (3) hash_S
> Optimizing function (4) insert_S
> Optimizing function (5) zero_S
> Optimizing function (6) init_S
> Optimizing function (7) matches_S
> Optimizing function (8) clean_up
>
> The developer would then use a binary search, recompiling with 
> selective optimization and re-running the test case to determine the 
> function in which the problem occurs.
>
> clang -c -O2 -num-fn=4 test.c
> <test passes>
> clang -c -O2 -num-fn=6 test.c
> <test passes>
> clang -c -O2 -num-fn=7 test.c
> <test fails>
>
> Having found the problem function, the developer would use the 
> '-num-fn-pass=-1' option to see the numbering of function passes 
> (including loop, region and basic block passes) that are run on that function.
>
> clang -c -O2 -num-fn=7 -num-fn-pass=-1 test.c
>
> Optimizing function (1) prune_match
> Optimizing function (2) free_S
> Optimizing function (3) hash_S
> Optimizing function (4) insert_S
> Optimizing function (5) zero_S
> Optimizing function (6) init_S
> Optimizing function (7) matches_S
> running necessary pass Function Pass Manager on function (7) matches_S 
> running necessary pass Module Verifier on function (7) matches_S 
> running necessary pass Add DWARF path discriminators (3) on function 
> (7) matches_S running pass (1) Simplify the CFG on function (7) 
> matches_S running analysis pass Dominator Tree Construction on 
> function (7) matches_S running pass (2) SROA on function (7) matches_S 
> running pass (3) Early CSE on function (7) matches_S running pass (4) 
> Lower 'expect' Intrinsics on function (7) matches_S <additional passes 
> ignored for brevity> NOT Optimizing function (8) clean_up
>
> The user would again use a binary search to determine the problem pass.
>
> clang -c -O2 -num-fn=7 -num-fn-pass=2 test.c <test passes> clang -c 
> -O2 -num-fn=7 -num-fn-pass=3 test.c <test fails>
>
> Having determined the problem pass, the developer would use the 
> '-num-fn-case=-1' option to see the numbering of individual 
> optimizations applied by the pass.
>
> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=-1 test.c
>
> Optimizing function (1) prune_match
> Optimizing function (2) free_S
> Optimizing function (3) hash_S
> Optimizing function (4) insert_S
> Optimizing function (5) zero_S
> Optimizing function (6) init_S
> Optimizing function (7) matches_S
> running necessary pass Function Pass Manager on function (7) matches_S 
> running necessary pass Module Verifier on function (7) matches_S 
> running necessary pass Add DWARF path discriminators (3) on function 
> (7) matches_S running pass (1) Simplify the CFG on function (7) 
> matches_S running analysis pass Dominator Tree Construction on 
> function (7) matches_S running pass (2) SROA on function (7) matches_S 
> running pass (3) Early CSE on function (7) matches_S Another case (1): 
> EarlyCSE CSE value Another case (2): EarlyCSE CSE value Another case 
> (3): EarlyCSE CSE value Another case (4): EarlyCSE CSE load Another 
> case (5): EarlyCSE CSE value Another case (6): EarlyCSE CSE load 
> Another case (7): EarlyCSE CSE value Another case (8): EarlyCSE CSE 
> load NOT running pass (4) Lower 'expect' Intrinsics on function (7) 
> matches_S <additional passes ignored for brevity> NOT Optimizing 
> function (8) clean_up
>
> Finally, the developer would use one last binary search to determine 
> which optimization case was causing the failure.
>
> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=4 test.c <test 
> fails> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=2 test.c 
> <test passes> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=3 
> test.c <test fails>
>
> Most of the above functionality can be implemented by inserting hooks 
> into the various pass managers.  Support for the '-num-fn-case' option 
> would require instrumentation of individual passes.  I propose 
> implementing this on an 'opt in' basis so that it can be added to 
> passes as needed and passes which do not opt in will simply not report having selectable cases.
>
> Note that I have introduced the concept of a "necessary" pass.  
> Because the goal of this is to have the front end produce the same 
> sort of object or executable file that it normally would, some passes 
> (such as the register allocator and its various helper passes) cannot 
> be skipped.  I am also, for now, proposing that all analysis passes be 
> run always to avoid the problem of determining which analysis passes would be required for later "necessary"
> passes.  We can revisit this as needed.
>
> As I've said, I intend to handle loop, region and basic block passes 
> as if they were just another function pass.  While I would like to 
> display an associated number for the target loop, region or block, I 
> don't see value in being able to filter passes based on this value 
> (because they are relatively dynamic constructs), so consecutive runs 
> of a single loop pass (for
> instance) on different loops would be identified as distinct function 
> passes.  For example:
>
> running pass (25) Loop Invariant Code Motion on loop (1), function (7) 
> matches_S running pass (26) Loop Invariant Code Motion on loop (2), 
> function (7) matches_S running pass (27) Loop Invariant Code Motion on 
> loop (3), function (7) matches_S
>
> I would also like to have a feature that would allow the developer to 
> request that an LLVM IR/bitcode file be emitted just before the first 
> disabled pass so that the isolated IR could be used with opt, llc or 
> bugpoint for further debugging.  I haven't worked out exactly how this 
> would work, so I'm just mentioning it here as a fuzzy part of the proposal.
>
> Circling back now to module and SCC passes, these would work in pretty 
> much the same way as the above method for function passes so I won't 
> cover them in as much detail.  I propose the following new command line options.
>
> -num-mod
>   sets an upper limit on the module on which to run module passes
>   -1 enables all and displays the numbering
>   defaults to 1 since I expect most front ends to usually operate on a 
> single module
>
> -num-mod-pass
>   sets an upper limit on the module pass to run on the limit module 
> specified by -num-mod
>   -1 enables all and displays the numbering
>   ignored if -num-mod is -1
>   all passes are run on modules below the limit module
>   only necessary and analysis passes are run on modules above the 
> limit module
>
> -num-mod-case
>   sets an upper limit on the optimization case to apply within the 
> limit module pass specified by -num-mod-pass
>   -1 enables all and displays the numbering
>   ignored if -num-mod-pass is not used or is -1
>   all cases are applied in passes below the limit module pass
>   no cases are applied in optional passes below the limit module pass
>
> -num-scc-pass
>   sets an upper limit on the SCC pass to run
>   -1 enables all and displays the numbering
>
> -num-scc-case
>   sets an upper limit on the optimization case to apply within the 
> limit SCC pass specified by -num-scc-pass
>   -1 enables all and displays the numbering
>   ignored if -num-scc-pass is not used or is -1
>   all cases are applied in passes below the limit SCC pass
>   no cases are applied in optional passes below the limit SCC pass
>
> Similar to loops, regions and basic blocks, I would like to present 
> numbering information for each unique SCC, but I do not believe there 
> is value in being able to filter on a specific SCC because they are 
> subject to reorganization.  Rather I would treat each invocation of an 
> SCC pass as a separate selectable pass.  The output for an SCC problem 
> search would look something like this:
>
> Optimizing SCC (1): <null function>
> running pass (1) Remove unused exception handling info on SCC (1) 
> running pass (2) Function Integration/Inlining on SCC (1) running pass 
> (3) Deduce function attributes on SCC (1) Optimizing SCC (2): isupper 
> running pass (4) Remove unused exception handling info on SCC (2) 
> running pass (5) Function Integration/Inlining on SCC (2) running pass 
> (6) Deduce function attributes on SCC (2) <snip> Optimizing SCC (34): 
> or_purge_E_list, and_purge_E_list, purge_Exp running pass (101) Remove 
> unused exception handling info on SCC (34) running pass (102) Function 
> Integration/Inlining on SCC (34) running pass (103) Deduce function 
> attributes on SCC (34) <snip>
>
> Here I'm printing the functions (if any) associated with nodes in the 
> SCC just to give the developer some feedback on which part of the code 
> is being worked on.
>
> I have a working prototype of all of the above functionality using 
> both the new pass manager and the legacy pass manager.
>
> So what do you think?  Do you agree that support for a feature of this 
> sort would be useful in the LLVM core library?  What problems do you 
> see in my proposal?  What additional options would you like to see added?
>
> Thanks in advance for any feedback.
>
>
>
> You're right that we could benefit from support to help "bisecting" 
> where a miscompile happens in a more automated way.
>
> Reading through you (long but nice) description, I thought about 
> something
> simpler: a single number incremented every time a pass in the 
> optimizer is ran (as the order shown by -debug-pass=Executions).
> That way you run once to get the total number of passes that ran and 
> then, a simple bisect script can find quite quickly at which point the 
> miscompile is introduced (between 0 and max).
> I haven't thought about it much but I'm interested in you view of the 
> pros/cons.
>
> --
> Mehdi
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

--
Sanjoy Das
http://playingwithpointers.com