[LLVMdev] Identifying function call targets at the MachineInstr level

Hal Finkel hfinkel at anl.gov
Wed Jul 15 22:34:24 PDT 2015


----- Original Message -----

> From: "Ethan J. Johnson" <ejohns48 at cs.rochester.edu>
> To: llvmdev at cs.uiuc.edu
> Sent: Thursday, July 16, 2015 12:16:52 AM
> Subject: [LLVMdev] Identifying function call targets at the
> MachineInstr level

> Hi all,

> I am trying to create a MachineFunctionPass that analyzes the call
> graph of a module. (The analysis specifically needs to look at X86
> operations, so unfortunately I cannot do this at the IR level, even
> though that would be substantially easier, given that there is no
> direct machine-level equivalent to a ModulePass.) Traversing the
> control-flow graph of basic blocks within a machine function is easy
> enough; the challenge I am facing is how to match call instructions
> to their targets, i.e., determine the function being called.

> At the IR level, I can create a CallGraph object from the Module, and
> iterate down to individual Instructions, where I can check if an
> instruction casts to a CallInst, and if it does, use
> CallInst::getCalledFunction() to find the target. At the
> MachineInstr level, however, a call instruction is simply an opcode
> with value arguments, and much of the semantics appear to be
> stripped away. Where MI is a pointer to a MachineInstr, I can use
> MI->getDesc().isCall() to determine if the instruction is a call;
> but there’s no indication of what is being called.

> My first “rough” attempt at solving this was to first traverse the
> IR-level Module in the MachineFunctionPass’s doInitialization(). For
> each function, I’m iterating down to individual Instructions,
> finding ones that can be dyn_casted to CallInst, and pushing the
> result of getCalledFunction() onto a std::queue (FIFO). Down in
> runOnMachineFunction(), I’m performing the “same” iteration, over
> MachineBasicBlocks and then MachineInstrs, and each time I find a
> call instruction (with getDesc().isCall()), I dequeue the next
> Function pointer saved from the IR level. This is then considered
> the “matched” target for the function call. (Since there is a 1-1
> correspondence between Functions and MachineFunctions, this is
> enough to determine the target.)
This will not work in general, as you note below, this depends on a correspondence between calls on the IR and MI levels, and nothing guarantees this (and will not be true in practice because lowering of many different IR constructs inserts calls to libc, etc.). Instead, you can extract the call destination from the call instruction itself. If you iterate over the instruction's operands, you should find that one of them returns true for isGlobal(). On such an operand, you can call getGlobal() to get the IR-level GlobalValue* representing the function. 

You might also find that isSymbol() is true, in which case you can call getSymbolName() (which returns a const char * with the symbol name). 

-Hal 

> Testing this approach with llc on the bitcode representation of a
> simple C program, it seems to mostly work; but for some functions,
> it appears that calls present in IR are no longer present in the
> corresponding MachineFunction. That is, there are call targets “left
> over” in the queue after the pass is finished. This prompts a few
> questions:

> 1) Is the order of function calls stable between the IR and
> MachineInstr layers?
> 2) Can inlining (which would remove function calls) occur between the
> IR and MachineInstr layers? (specifically, between the Module
> processed by doInitialization(), and the MachineFunction processed
> by runOnMachineFunction()?)
> 3) Do:
> a. dyn_casting an Instruction to a CallInst at the IR level, and
> b. checking MI->getDesc().isCall() at the MachinIstr level
> both comprehensively account for all possible ways an instruction (in
> IR or X86, respectively) can make a call – direct or indirect calls,
> etc.? (i.e., could one of these methods “miss” a call that the other
> would identify, assuming that no function calls have been removed or
> reordered between them?)
> 4) Does anyone know of a better way to do this? :-)

> Any suggestions or answers are greatly appreciated.

> Sincerely,
> Ethan Johnson

> Ethan J. Johnson
> Computer Science PhD student, Systems group, University of Rochester
> ejohns48 at cs.rochester.edu
> ethanjohnson at acm.org
> PGP pubkey available from public directory or on request

> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150716/cd9fc25c/attachment.html>


More information about the llvm-dev mailing list