[LLVMdev] global control flow graph at machine code level

Wed Aug 1 09:25:54 PDT 2012

Florian Brandner <flbr <at> imm.dtu.dk> writes:

> 
> Hi Abhishek,
> On Sunday, July 29, 2012 18:32:11 AbhishekR wrote:
> > It seems like I may have to modify the way MachineFunction is instantiated 
in MachineFunctionAnalysis.
> Instead of doing it per Function, it may have to be done for the entire Module 
by instantiating
> MachineFunction objects for every Function inside the Module. This might 
require major changes to the
> PassManager framework as well. Is there some work in this direction and code 
that someone can share? Or an
> alternative solution?
> 
> yes, the MachineFunctionAnalysis creates the MachineFunctions. unfortunately 
> a MachineFunction is destroyed along the MachineFunctionAnalysis that created 
> it. this happens for instance when you schedule a module pass (where you could 
> operate on the global control-flow) somewhere during code generation.
> 
> A workaround could be to modify the MachineFunctionAnalysis such that it 
> stores the MachineFunction in a look-up table inside the MachineModuleInfo 
> instead of destroying it before your module pass is run. once your module 
> pass is finished, a new MachineFunctionAnalysis is scheduled by the pass 
> manager. Now, instead of creating a new function, you could check the look-up 
> table of the MachineModuleInfo to get the original MachineFunction.
> 
> it does not appear to be a very complicated change, but there might be some
> dirty details that could make this approach hard to implement, e.g., 
information 
> stored with the MachineFunctionAnalysis itself. you could move this 
information 
> to a new class and see the MachineFunctionAnalysis as a wrapper to this class.
> 
> a nice property of this solution is that code generation still proceeds on a 
> per-function-basis, unless you explicitly insert a module pass. 
> 
> another problem is, that there is little (or no) support to construct the 
> global control-flow from the machine code. for instance, the call graph is 
> based on LLVM-IR. depending on the target architecture, you might have call 
> sites in the machine code that were not visible in the LLVM-IR, e.g., when you 
> implement floating point operations using library calls. so there might be 
some
> extra work needed to get this infrastructure too.
> 
> I have not yet worked on this yet, but I plan to implement something like this
> at some point sooner or later. if you find another (better) solution let me 
> know.

I implemented a workaround solution for now but I think eventually I might go 
with your solution. My solution generated the global control flow graph (GCFG) 
in 2 passes (llc needs to be run twice). In the first pass, I printed the 
necessary information to a file, like the Machine Function (MF) Name, it's ID 
number, the IDs of MBBs inside the MF, the Call Sites inside the MF in the order 
in which they are called from the respective MBBs, the predecessors and 
successors of MBBs and whether a MBB has a return statement.
   In the second pass which should be immediately after the 1st pass in the 
CodeGen flow (otherwise MFs and MBBs can get updated by the CodeGen flow in 
between), I read back the file. This again is done in 2 passes. The first pass 
parses the file and populates the skeleton of GCFG data structure with the MF 
Nodes and MBB nodes inside those MF nodes (note that each Call Site in an MBB 
splits the MBB into two). Here the nodes are identified by IDs and a DenseMap is 
maintained to establish the MF name to ID correlation. The second pass parses 
the file again and attaches the predecessor, and successor links to the MBB 
nodes.

> 
> best,
> Florian
>