[LLVMdev] GSoC Proposal: Table-Driven Decompilation

Charles Davis cdavis at mymail.mines.edu
Thu Apr 5 06:42:23 PDT 2012


On Apr 4, 2012, at 12:50 AM, Tobias Grosser wrote:

> On 04/04/2012 07:08 AM, Charles Davis wrote:
>> Hi,
>> 
>> Here's one of my proposals for GSoC 2012. What do you think?
>> 
>> Chip
>> 
>> Project Title: Table-Driven Decompilation
>> 
>> Abstract:
>> Over the years, the LLVM family has grown to include nearly every type of build tool in existence. One of the few missing is a decompiler. LLVM's TableGen tool could potentially accelerate development of such a tool; most backends already have the information needed to implement it. This project proposes implementing support for decompilation in LLVM using information gleaned from target description files. Such a decompiler could be used for analysis, optimization, and recompilation of machine code.
> 
> Hi Chip,
> 
> I have little experience in this area, but here some feedback:
> 
> The proposal looks nice and decompilation and later binary to binary translation sounds very interesting. However, to me it seems this is a very difficult topic and it would be good if the proposal would show you understand the difficulties and you have ideas how to solve them. A topic that I heard is difficult is e.g. how to keep track of the state of registers and CPU flags.
> 
> The libcpu project solves this by not even trying to reverse the individual LLVM-IR to machine code transformations, but to directly emit LLVM-IR that directly models each instruction as function calls that perform the original calculation and that, at the same time, model the CPU state. You may want to investigate what they do exactly,
> Akso it would be interesting to explain how your approach comparer to the libcpu approach? Do you think yours has benefits? What are its drawbacks?
My plan was to do "register deallocation," whereupon code which uses a limited number of physical registers would be converted back to SSA form. (I guess I should have elaborated on that in my proposal, huh? :) I think my approach would result in faster code, since it wouldn't basically be trying to run an emulator on each instruction. A major drawback is that it seems to be very difficult to reverse the IR->SDAG and SDAG->MachineInstr transformations. Perhaps it would be better and faster to use something like FastISel for MachineInstr->LLVM IR transformations?
> 
> Did you consider to improve libcpu instead of starting your own tool?
No, I did not. I'll have to look into that.

Thanks.

Chip





More information about the llvm-dev mailing list