[llvm-dev] Identifying MachineOperands that are part of an address specification

Ethan J. Johnson via llvm-dev llvm-dev at lists.llvm.org
Thu Jul 20 09:52:38 PDT 2017


Dear LLVM-Dev,

I'm writing a system that does analysis on x86 machine code in the LLVM 
backend (i.e., MachineFunctionPasses). Part of this involves data-flow 
analysis (reaching definitions, to be exact) on machine instructions, 
handling data flow through both registers and memory locations. This 
analysis provides an interface whereby I can query it to determine the 
set of definitions that reach a particular register use-operand 
(MachineOperand) of a machine instruction, or a memory load operand 
(MachineMemOperand).

Given this, my goal is to determine whether each variable input to an 
instruction - whether a register use or a memory load - is reached by 
some definition in a particular (known) set. However, the relationship 
between MachineOperands and MachineMemOperands complicates this.

Whenever a machine instruction does a memory load/store, it has both:

  * A MachineMemOperand, which specifies the details of the load/store
    at a high level; and
  * A sequence of register and immediate MachineOperands, which
    represent the low-level encoding of the memory address in the
    instruction. For x86, this sequence consists of five operands,
    specifying the base register, scale constant, index register, offset
    constant, and segment register respectively. (In cases where the
    full 5-part addressing mode is not needed, some of the registers can
    be set to %noreg and the constants to identity values, e.g. scale=1
    and offset=0. This convention is detailed in the code generator
    documentation
    <http://llvm.org/docs/CodeGenerator.html#representing-x86-addressing-modes-in-machineinstrs>.)

The problem I'm having is that there's no way to tell from the 
MachineOperands themselves whether they were generated as part of a 
memory address specification sequence, or as a "real" register use that 
provides a value to be computed on by the instruction. Thus when I go to 
query my reaching-definitions interface, I don't know which register 
operands I should be querying /as registers/ and which I should be 
skipping to instead query /as memory accesses/ (i.e., via their 
MachineMemOperands). Although it's certainly /valid/ to ask the question 
"which definitions reach this register operand" when the operand is part 
of an address specification, it's not particularly /useful/ - I'm 
interested in the flow of data in the logical computation, not "the 
value of RBP used in this stack-frame-relative load was defined in the 
'mov %rsp, %rbp' instruction at the beginning of the function". Hence 
why I want to skip these and instead look at the respective 
MachineMemOperands.

So, my question is: *is there any good way to identify whether a 
MachineOperand was generated as part of a memory-addressing sequence?*

I looked through the MachineOperand and MachineMemOperand Doxygen trying 
to find some link between the two, but to no avail. I also read through 
a lot of the CodeGen and X86 backend code learning how these operand 
sequences are generated, but I didn't see a /single/ place where this 
/consistently/ happens that I could (for instance) modify to note which 
operands are generated this way. As a last resort I could try to guess 
which operands are the memory addressing sequence by position (e.g., for 
stores, the memory-addressing operands seem to always come first), but I 
would /really/ prefer not to do that because there are so many memory 
instructions in x86 that it would be a lot of work to comprehensively 
account for all of them. :-)

Thank you,
Ethan Johnson
//

-- 
Ethan J. Johnson
Computer Science PhD student, Systems group, University of Rochester
ejohns48 at cs.rochester.edu
ethanjohnson at acm.org
PGP public key available from public directory or on request

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170720/fa9904e2/attachment.html>


More information about the llvm-dev mailing list