[LLVMdev] Identifying function call targets at the MachineInstr level

Ethan J. Johnson ejohns48 at cs.rochester.edu
Wed Jul 15 22:16:52 PDT 2015


Hi all,

 

I am trying to create a MachineFunctionPass that analyzes the call graph of
a module. (The analysis specifically needs to look at X86 operations, so
unfortunately I cannot do this at the IR level, even though that would be
substantially easier, given that there is no direct machine-level equivalent
to a ModulePass.) Traversing the control-flow graph of basic blocks within a
machine function is easy enough; the challenge I am facing is how to match
call instructions to their targets, i.e., determine the function being
called.

 

At the IR level, I can create a CallGraph object from the Module, and
iterate down to individual Instructions, where I can check if an instruction
casts to a CallInst, and if it does, use CallInst::getCalledFunction() to
find the target. At the MachineInstr level, however, a call instruction is
simply an opcode with value arguments, and much of the semantics appear to
be stripped away. Where MI is a pointer to a MachineInstr, I can use
MI->getDesc().isCall() to determine if the instruction is a call; but
there's no indication of what is being called.

 

My first "rough" attempt at solving this was to first traverse the IR-level
Module in the MachineFunctionPass's doInitialization(). For each function,
I'm iterating down to individual Instructions, finding ones that can be
dyn_casted to CallInst, and pushing the result of getCalledFunction() onto a
std::queue (FIFO). Down in runOnMachineFunction(), I'm performing the "same"
iteration, over MachineBasicBlocks and then MachineInstrs, and each time I
find a call instruction (with getDesc().isCall()), I dequeue the next
Function pointer saved from the IR level. This is then considered the
"matched" target for the function call. (Since there is a 1-1 correspondence
between Functions and MachineFunctions, this is enough to determine the
target.)

 

Testing this approach with llc on the bitcode representation of a simple C
program, it seems to mostly work; but for some functions, it appears that
calls present in IR are no longer present in the corresponding
MachineFunction. That is, there are call targets "left over" in the queue
after the pass is finished. This prompts a few questions:

 

1)      Is the order of function calls stable between the IR and
MachineInstr layers?

2)      Can inlining (which would remove function calls) occur between the
IR and MachineInstr layers? (specifically, between the Module processed by
doInitialization(), and the MachineFunction processed by
runOnMachineFunction()?)

3)      Do:

a.       dyn_casting an Instruction to a CallInst at the IR level, and

b.      checking MI->getDesc().isCall() at the MachinIstr level

both comprehensively account for all possible ways an instruction (in IR or
X86, respectively) can make a call - direct or indirect calls, etc.? (i.e.,
could one of these methods "miss" a call that the other would identify,
assuming that no function calls have been removed or reordered between
them?)

4)      Does anyone know of a better way to do this? :-)

 

Any suggestions or answers are greatly appreciated.

 

Sincerely,

Ethan Johnson

 

Ethan J. Johnson

Computer Science PhD student, Systems group, University of Rochester

 <mailto:ejohns48 at cs.rochester.edu> ejohns48 at cs.rochester.edu

 <mailto:ethanjohnson at acm.org> ethanjohnson at acm.org

PGP pubkey available from public directory or on request

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150716/0c2cf2e8/attachment.html>


More information about the llvm-dev mailing list