[LLVMdev] Basic instructions for LLVM and Control Flow graph extraction

Ahmed Bougacha ahmed.bougacha at gmail.com
Tue Jul 9 16:17:04 PDT 2013


On Tue, Jul 9, 2013 at 2:06 PM, Micah Villmow
<micah.villmow at smachines.com> wrote:
> This isn’t by itself too difficult, as I have done something similar
> recently, but does require some modifications of LLVM.

By the way there’s some stuff in LLVM that creates an MC CFG
(MCModule, MCObjectDisassembler, ..), but it still needs a lot of work
to be reliable and work in more cases - I have some patches locally
that need some more work and that I’ll eventually push though.

It gets tricky when you want to really have basic blocks, without
duplicating subsets of the instructions when you discover an entry
point in a basic block you already created. It’s even trickier when
you consider jumping inside an instruction, and needing to join an
existing basic block.

For instance if you jump to an instruction that starts at address X
and takes up 7 bytes, but disassembling at address X+5 gives you a
valid 2 byte instruction, then you need to have a basic block with the
7byte instruction, another with the 2byte one, and both having the
basic block starting at X+7 as a successor.

If you want to do some quick experimentation, you can use
"llvm-objdump -cfg -d <binary>”, which gives you a CFG for each
function found in the binary in a separate graphviz dot file. It
doesn’t look at the object file format stuff (symbols, or fancier
things like the FUNCTION_STARTS load command on mach-o), but again,
I’ll get around to all this eventually.

Until then, patches welcome !

— Ahmed

> The basic algorithm is simple:
>
> For each ISA instruction, create a new MachineInstr and add it to the
> current MachineBasicBlock.
>
> At each branch instruction, add it to the current MBB and add it to a list
> and create a new MBB.
>
> After creating your list of MBB, iterate through them and reconnect the
> successors based on branches and fall throughs.
>
>
>
> The problem is that what you are producing has no connection to the IR, and
> there are parts of LLVM that expect that link, specifically the printing/CFG
> dumping functions.
>
>
>
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
> Behalf Of Clay J
> Sent: Tuesday, July 09, 2013 10:36 AM
> To: llvmdev at cs.uiuc.edu
> Subject: [LLVMdev] Basic instructions for LLVM and Control Flow graph
> extraction
>
>
>
> I am currently attempting to learn how to use LLVM for control flow graph
> extraction on linux (Ubuntu). Basically, I need to be able to break down
> specific basic functions blocks from assembly code, and use it to make a
> CFG.
>
> Do any of you upstanding human beings have any knowledge or resources that
> could possibly assist me in this task?
>
> I apologize if this is a very basic question. I have already installed the
> proper files/programs.
>
> Thank you in advance.
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>




More information about the llvm-dev mailing list