[LLVMdev] Using LLVM for decompiling.

James Courtier-Dutton james.dutton at gmail.com
Mon May 7 11:12:43 PDT 2012


On 7 May 2012 18:08, Joshua Cranmer <pidgeot18 at gmail.com> wrote:
> On 5/7/2012 11:45 AM, James Courtier-Dutton wrote:
>> On 7 May 2012 16:31, John Criswell<criswell at illinois.edu>  wrote:
>>> Given that you've completed steps one and two (i.e., you've converted the
>>> binary instructions to LLVM IR and then discovered basic blocks), then yes,
>>> LLVM's current analysis passes should help you with this third step.  LLVM
>>> has passes that normalize loops, identify loops in local control-flow
>>> graphs, identify dominators/post-dominators, etc.
>> Great, which bit of the LLVM source code does this bit (3)?
>
> Several of the passes in Analysis or Transforms. The code doesn't really
> work if it's not being output in IR, which your current library doesn't
> appear to be spitting out.

Yes, my code does not yet produce IR. I am exploring whether doing so,
would it help in the decompiler, or be more like square peg in round
hole.

>
> Decompiling control flow is fairly easy to do (google "control flow
> structuring" or similar terms and you'll hit open a few troves of papers
> which give fairly clear details on how to do it), so it's rather the
> problem that stripped, optimized executables don't give you reliable
> ways to find functions (or parameters, for that matter) and the complete
> abolition of typing and variable information that makes decompiling
> extremely difficult.
>

That is right, I need to get the control flow structure working before
I can look further into data type analysis.
My current interest is decompiling stripped .o files.
>From the point of view of data type analysis, so far, I have function
parameters listed, without typing. I have local variables listed.
I have determination of pointer or not.
I am going to approach it from a statistical point of view. I.e. If
the code implies certain data types, act it them. If it is still
ambiguous, ask a human to decide.
I am hoping to determine things like size of arrays by analyzing the
control flow for what range of values the index into the array can
take. So, a lot of previous work into static analysis will help me
there.
I am approaching the problem from the point of view of starting with a
binary that you have to decompile manually, with my decompile tool
helping to remove a lot of the painstaking work that can be done using
a computer algorithm of some type.

Kind Regards

James




More information about the llvm-dev mailing list