[llvm-dev] [GlobalISel][RFC] Thoughts on MachineModulePass

Sat Jan 23 20:41:23 PST 2016

> -----Original Message-----
> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of John
> Criswell via llvm-dev
> Sent: Friday, January 22, 2016 9:39 PM
> To: Quentin Colombet <qcolombet at apple.com>; llvm-dev <llvm-
> dev at lists.llvm.org>; Ethan Johnson <ethanjohnson89 at gmail.com>
> Subject: Re: [llvm-dev] [GlobalISel][RFC] Thoughts on MachineModulePass
> 
> On 1/22/16 6:16 PM, Quentin Colombet via llvm-dev wrote:
> > Hi,
> >
> > In the initial thread of the proposal for GlobalISel, I have mentioned that
> it may be interesting to have a kind of MachineModulePass.
> > Marcello mentioned this would be useful for their current pipeline.
> >
> > I am interested in knowing:
> > 1. If anyone else is interested for such concept?
> > 2. What kind of information should we make accessible in an hypothetical
> MachineModule? I.e., how do you plan to use the MachineModulePass so that we
> make the right design decisions for the MachineModule feeding those passes?
> > 3. Who would be willing to work on that?
> 
> Nearly perfect timing.  I just wrote a grant proposal requesting funding
> to do just such a thing.
> :)
> 
> My research group is interested in a MachineModulePass because we are
> using LLVM's MachineInstr infrastructure for analyzing machine code.
> Specifically, we are attempting to build an infrastructure for measuring
> how well various defenses work against code reuse attacks.  We are
> analyzing both data flow and control flow, and it would be handy for us
> to be able to analyze an entire program's assembly code (because we're
> looking for every last reusable instruction that an attacker could use
> and how those instructions can be strung together).  We want to analyze
> after everything has been done (register allocation, instruction
> selection and scheduling, etc.).
> 
> At the very least, we'll be doing analysis, though it is conceivable
> that we would want to do transformation in the future (e.g., if we can
> determine that breaking certain data flows would stop an attack, we
> could transform the code to change the data flow).
> 
> Ethan, can you add anything more specific on what would be on our wish list?

The main thing that comes to mind is that it would be useful to have access to call graph information at the machine code level. Since this is already being tracked at the IR level, a lot of that information could probably be "inherited" by a MachineModule during code generation.

In particular, taking advantage of LLVM's "inside knowledge" about the semantics of call instructions would be helpful in identifying the targets of indirect calls. Right now, the only way to determine the target of a machine-level call is to look at the instruction's operands, and if any of them refer to globals, try to dyn_cast them to a Function. This, of course, only works for direct calls. For indirect calls, the best we could do is try to use data-flow analysis to determine what's in the pointer being called, and attempt to match that to the known address of a MachineFunction. As I understand it (and please correct me if I'm wrong), the existing IR-level call graph analysis already "knows" where the function pointer came from (unless the code is calling a "wild" function pointer created through an unsafe cast, but that's another story).

Ethan Johnson

> 
> As for resources, we're currently early enough in the project that we're
> not needing the inter-procedural analysis, and if we do need it, it may
> be quicker for us to hack something together than to enhance LLVM
> properly.  The point of the proposal is to seek additional funding so
> that we could afford to do things properly instead of just hacking
> something together just to meet our own research needs.  That said, if
> we makes sense to join forces, we'd certainly be open to doing that.
> 
> Regards,
> 
> John Criswell
> 
> >
> > Thanks,
> > -Quentin
>