[PATCH] D113030: Add a new tool for parallel safe bisection, "llvm-bisectd".

Tue Nov 2 12:20:12 PDT 2021

aemerson added a comment.

In D113030#3103808 <https://reviews.llvm.org/D113030#3103808>, @qcolombet wrote:

> Let me just step back a little bit and say that now that I think about what we did, having something that answers "should I run in this instance" is desirable, the implementation doesn't really matter. We did it with function attributes, but having a bisect client API like you're introducing is fine.
> My only complain is that the client interface should not have remote in the name :P.
>
> From an abstraction level, we need two things:
>
> 1. Something that tells if an optimization needs to run (the remote bisect client here)
> 2. Something that drives the on/off of the optimizations based on the previous state (here your daemon)
>
> The way we did that was:
> For #1 we added annotations in the IR
> For #2 we implemented the driver directly in our JIT daemon
>
> Essentially, that boils down to something that formulates a plan and something that executes the plan. At one point I was thinking that formulating the plan could be changing the pass pipeline (don't insert what you don't want to run), but that look like too much work :).
>
>> How does that work when you have parallel builds?
>
> Each module is assigned an ID and the bisect plan and previous state are mapped to this ID.
> The ID is saved in the module metadata but for now we didn't use it since all we needed was added to the IR via annotations (i.e., we didn't need to come up with a key to ask information about specific pass). In that regard, you're approach is more general.
> For the ID, we used a hash of the module before the bisect annotations were added, i.e., as long as you don't change the front-end the ID are stable between runs.
>
> To summarize:
> Compute the module ID -> add annotation based on past information -> run the backend (at this point, the backend runs by itself.)
>
>> When multiple clang processes are running simultaneously, and you want to bisect to a specific translation unit, and then within that TU to a specific point in the module, don't you need some co-ordination?
>
> At a high level here is what the driver was doing:
>
> - Bisect optnone on each module
> - Find the module(s) that creates the problem (the minimal set of modules that needs optimizations turn on)
> - Do the same on each function (the minimal set may involve more than one function)
> - Try to "outline" the basic blocks of each problematic function and do the same process on the newly created functions
> - Split the problematic basic block to make them smaller and continue
> - When you're happy with the size of the basic blocks, start bisecting the optimizations on the problematic functions (possibly basic block extracted). Right now we were only bisecting a handful of optimization because the final diff with the basic block splitting usually made the faulty optimization easy to find by hand.
>
> The way it worked is all that state was saved in a file <shaderID>-bisect-info. You could bootstrap the process by populating the file by hand, i.e., by telling the JIT process which module you want to bisect.
>
> As far as bisecting to a specific point in the TU, we were always going all the way down to the executable then you had to supply a script that tells whether or not the program is working. That's similar to what git bisect is doing (if the script runs 0, the program works, if that's one it doesn't). In your script you could check for whatever (specific sequence of asm, executable producing some results, etc.)
>
> Note: The pass I was talking about in my previous reply that we insert in the LLVM pipeline, is generating all the information to start the bisect process (e.g., the shader ID, the list of all the functions, the list of all basic blocks). Then the driver was using this information to tell that pass to add some annotation on some function (e.g., optnone, noinline, etc.), but also to split some basic block and outline them (and attach some annotation on them).
>
> Cheers,
> -Quentin

Ok I think I sort of understand your flow now. I agree that it doesn't sound like our approaches are really conflicting. The remote bisection client code could certainly be hidden behind a more generic interface, and for your approach we could select an implementation that just queries the function attributes instead. For the bisection co-ordination with the files I'm not sure how that impacts this tooling, if it all. One reason I went for a daemon was that for some build systems, persistent files across builds are difficult to keep due to build sandboxing (sockets themselves need some workarounds to work with sandboxes).

I'll see what kind of abstraction I can come up with for the client in this patch, but it won't be tested since your tooling isn't upstream. I'm also guessing that you'd want to avoid using string keys for the function-attribute implementation?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113030/new/

https://reviews.llvm.org/D113030