[LLVMdev] Using LLVM for decompiling.
Joshua Cranmer
pidgeot18 at gmail.com
Mon May 7 10:08:38 PDT 2012
On 5/7/2012 11:45 AM, James Courtier-Dutton wrote:
> On 7 May 2012 16:31, John Criswell<criswell at illinois.edu> wrote:
>> Given that you've completed steps one and two (i.e., you've converted the
>> binary instructions to LLVM IR and then discovered basic blocks), then yes,
>> LLVM's current analysis passes should help you with this third step. LLVM
>> has passes that normalize loops, identify loops in local control-flow
>> graphs, identify dominators/post-dominators, etc.
> Great, which bit of the LLVM source code does this bit (3)?
Several of the passes in Analysis or Transforms. The code doesn't really
work if it's not being output in IR, which your current library doesn't
appear to be spitting out.
>> LLVM might have facilities for converting LLVM IR to C or C++ code (the C
>> backend was recently removed; there might be a C++ backend, but I'm not
>> sure). However, they are primarily designed for systems for which LLVM does
>> not provide a native code generator, so the C/C++ code they output isn't
>> very readable.
> Is the reason that the C/C++ code is not very readable, because there
> is not high enough metadata in the LLVM IR to do it?
> I.e. Lack of structure.
> Or was it because it was only designed to be input to another
> compiler, so pretty structure like for loops etc, was not necessary?
> Why was the C backend removed?
>
The original reason for the C-backend was to be able to use LLVM for
optimizations on architectures that it didn't have code generators for.
However, several changes had been subsequently made to the IR that
didn't update the C-backend code generator, so it got to the point where
it tended to work only on small, trivial code samples.
It was never intended to be a decompiler, so the output was never
particularly readable, and it didn't bother trying to structure if
statements (IIRC, it was pretty much entirely goto-based). I earlier
linked to a project which purports to be able to output more readable
code as a backend (to try to do source-to-source translations for OpenCL
kernels), but I haven't had the time to investigate it in great detail,
so I don't know how well it holds up to the code.
Decompiling control flow is fairly easy to do (google "control flow
structuring" or similar terms and you'll hit open a few troves of papers
which give fairly clear details on how to do it), so it's rather the
problem that stripped, optimized executables don't give you reliable
ways to find functions (or parameters, for that matter) and the complete
abolition of typing and variable information that makes decompiling
extremely difficult.
--
Joshua Cranmer
News submodule owner
DXR coauthor
More information about the llvm-dev
mailing list