[llvm-dev] MachineFunction Instructions Pass using Segment Registers

Tue Jun 26 12:57:31 PDT 2018

Dear Craig,

Thanks for the help so far. I have rewritten my assembly to comply
with user-land not being able to directly modify the segment registers
%GS/%FS. I used llvm-mc with -show-inst to get the equivalent LLVM
instruction + operands. Now I am working backwards to actually code
this assembly into my MachineFunctionPass and got the easy assembly
implemented, however my more complicated asm is still struggling as I
am still seeing 0x0(%rbp) instead of (%gs) or errors.
Core question here being: how do I properly create BuildMI statements
for assembly dealing with offsets?
-------------------------------------------------------------------------------------------------
Assembly I want to translate:
mov   (%gs), %r14                  //get value off %GS base addresss
mov %r15, %gs:0x0(%r14)     //put value in R15 into R14:(%GS)  [ (%GS) + R14 ]
--------------------------------------------------------------------------------------------------
LLVM-MC -show inst gives:
movq    (%gs), %r14          # <MCInst #1810 MOV64rm
                                        #  <MCOperand Reg:117>
                                        #  <MCOperand Reg:33>
                                        #  <MCOperand Imm:1>
                                        #  <MCOperand Reg:0>
                                        #  <MCOperand Imm:0>
                                        #  <MCOperand Reg:0>>
movq    %r15, %gs:(%r14)        # <MCInst #1803 MOV64mr
                                        #  <MCOperand Reg:117>
                                        #  <MCOperand Imm:1>
                                        #  <MCOperand Reg:0>
                                        #  <MCOperand Imm:0>
                                        #  <MCOperand Reg:33>
                                        #  <MCOperand Reg:118>>
-------------------------------------------------------------------------------------------------------
I'll be honest and say I don't really know how to add the operands
properly to BuildMI. I figured out the following so far
MachineInstrBuilder thing = BuildMI(MachineBB, Position in MBB ,
DebugLoc(not sure what this accomplishes), TII->get( X86 instruction I
want), where instruction result goes)

this has .add(MachineOperand)
            .addReg(X86::a reg macro)
            .addIMM(a constant like 0x8)
            and a few more I dont think apply to me.

but I am not sure I must follow a specific order? I am assuming yes
and it has something to do with X86InstrInfo.td definitions, but not
sure.
--------------------------------------------------------------------------------------------------------
LLVM C++ code I tried to translate this to:
/* 1 mov   (%gs), %r14 */
    MachineInstrBuilder e1 =
BuildMI(MBB,MBB.end(),DL,TII->get(X86::MOV64rm),X86::R14)
       .addReg(X86::GS);
/* 2 mov %r15, %gs:0x0(%r14) */
    MachineOperand baseReg = MachineOperand::CreateReg(X86::GS,false);
    MachineOperand scaleAmt = MachineOperand::CreateImm(0x1);
    MachineOperand indexReg = MachineOperand::CreateReg(X86::R14,false);
    MachineOperand disp = MachineOperand::CreateImm(0x0);

    BuildMI(MBB, MBB.end(), DL, TII->get(X86::MOV64mr))
      .add(baseReg)
      .add(scaleAmt)
      .add(indexReg);

/* both instructions give the following error

clang-6.0: /LLVM6.0.0/llvm/include/llvm/ADT/SmallVector.h:154: const
T& llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2>
>::operator[](llvm::SmallVectorTemplateCommon<T,
<template-parameter-1-2> >::size_type) const [with T =
llvm::MCOperand; <template-parameter-1-2> = void;
llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2>
>::const_reference = const llvm::MCOperand&;
llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2>
>::size_type = long unsigned int]: Assertion `idx < size()' failed.

I saw this function in the code base but not sure what it does
"addDirectMem(MachineInstructionBuilder_thing, register you want to
use);"

This is be the last bit of information I think I need to finish up
this implementation. Thanks again for your help!

Sincerely,

Chris Jelesnianski

On Sat, Jun 23, 2018 at 11:32 PM, Craig Topper <craig.topper at gmail.com> wrote:
> The size suffix thing is a weird quirk in our assembler I should look into
> fixing. Instructions in at&t syntax usually have a size suffix that is often
> optional
>
> For example:
>   add %ax, %bx
> and
>   addw %ax, %bx
>
> Are equivalent because the register name indicates the size.
>
> but for an instruction like this
>   addw $1, (%ax)
>
> There is nothing to infer the size from so an explicit suffix is required.
>
> So for an instruction like "add %ax, %bx" from above, we try to guess the
> size suffix from the register. In your case, you used a segment register
> which we couldn't guess the size from. And then we printed a bad error
> message.
>
> There's no quick reference as such for the meaning of the various
> X86::XXXXXX names. But the complete list of them is in
> lib/Target/X86/X86GenInstrInfo.inc in your build area. The names are meant
> to be fairly straight forward to understand. The first part of the name
> should almost always be the instruction name from the Intel/AMD manuals. The
> lower case letters at the end sort of convey operand types, but often not
> the number of operands even though it looks that way. The most common
> letters are 'r' for register, 'm' for memory and 'i' for immediate. Numbers
> after 'i' specify the size of the immediate if its important to distinguish
> from other sizes or different than the size of the instruction. The lower
> case letters are most useful to distinguish different instructions from each
> other. So for example, if two instructions only differ in the lower case
> letters and one says "rr" and one says "rm", the first is the register form
> and the second is the memory form of the same instruction.
>
> ~Craig
>
>
> On Sat, Jun 23, 2018 at 7:55 PM K Jelesnianski <kjski at vt.edu> wrote:
>>
>> Dear Craig,
>>
>> Thank you super much for the quick reply! Yea I'm still new to working
>> on the back-end and that sounds great. I already have the raw assembly
>> of what I want to accomplish so this is perfect. I just tried it and
>> yea, I will have to break down my assembly even further to more
>> simpler operations. You're right about my assembly dealing with
>> segment registers as I'm getting the following error:
>> "error: unknown use of instruction mnemonic without a size suffix"
>>
>> Just curious, what does it mean by size suffix??
>>
>> It's super cool to see the equivalent with "-show-inst"!!! Thank you
>> so much for this help!
>>
>> Last note, I know that the definitions (e.g. def SUB32ri) of the
>> various instructions can be found in the various ****.td, but is there
>> documentation where the meaning or quick reference of every
>> X86::XXXXXX llvm instruction macro can found, so I can quickly pick
>> and choose which actual macro I need to use, to "work forwards" rather
>> than working backwards by writing the assembly first and using llvm-mc
>> -show-inst  ??
>>
>> Thanks super much again.
>>
>> Sincerely,
>>
>> Chris Jelesnianski
>> Graduate Research Assistant
>> Virginia Tech
>>
>> On Sat, Jun 23, 2018 at 8:45 PM, Craig Topper <craig.topper at gmail.com>
>> wrote:
>> > More specifically there is no instruction that can add/subtract segment
>> > registers. They can only be updated my the mov segment register
>> > instructions, opcodes 0x8c and 0x8e in x86 assembly.
>> >
>> > I suggest you write the text version of the assembly you want to
>> > generate
>> > and assemble it with llvm-mc. This will tell you if its even valid.
>> > After
>> > that you can use -show-inst to print the names of the instructions that
>> > X86
>> > uses that you can give to BuildMI.
>> >
>> > ~Craig
>> >
>> >
>> > On Sat, Jun 23, 2018 at 5:36 PM Craig Topper <craig.topper at gmail.com>
>> > wrote:
>> >>
>> >> The SUB32ri can't instruction can't operate on segment registers. It
>> >> operates on EAX/EBX/EDX/ECX/EBP, etc. When it gets encoded only 3 or 4
>> >> bits
>> >> of the register value make it into the binary encoding. Objdump just
>> >> extracts those 3 or 4 bits back out and prints one of the
>> >> EAX/EBX/EDX/ECX/EBP registers that those bits correspond to.
>> >>
>> >> ~Craig
>> >>
>> >>
>> >> On Sat, Jun 23, 2018 at 5:28 PM K Jelesnianski via llvm-dev
>> >> <llvm-dev at lists.llvm.org> wrote:
>> >>>
>> >>> Dear All,
>> >>>
>> >>> Currently I am trying to inject custom x86-64 assembly into a
>> >>> functions entry basic block. More specifically, I am trying to build
>> >>> assembly in a machine function pass from scratch.
>> >>>
>> >>> While the dumped machine function instruction info displays that %gs
>> >>> will be used, when I perform objdump -d on my executable I am see that
>> >>> %gs is replaced by %ebp? Why is this happening?
>> >>>
>> >>> I know it probably has something to do with me not specifying operands
>> >>> properly, but I cannot find enough documentation on this besides
>> >>> looking through code comments such as X86BaseInfo.cpp. I feel there
>> >>> isn't enough for me to be able to connect the dots.
>> >>>
>> >>> Below I have sample code: %gs holds a base address to a memory
>> >>> location where I am trying to store information. I am trying to update
>> >>> the %gs register pointer location before saving more values, etc.
>> >>>
>> >>> LLVM C++ codeMachine Function pass code:
>> >>> MachineInstrBuilder sss = BuildMI(MBB, MBB.begin(), DL,
>> >>> TII->get(X86::SUB32ri),X86::GS)
>> >>>                     .addReg(X86::GS)
>> >>>                     .addImm(0x8);
>> >>>
>> >>> machine function pass dump:
>> >>>  %gs = SUB32ri %gs, 8, implicit-def %eflags
>> >>>
>> >>> Objdump -d assembly from executable
>> >>>   400510:   81 ed 04 00 00 00       sub    $0x8,%ebp
>> >>>
>> >>>
>> >>> TLDR: I am trying to create custom assembly via BuildMI() and
>> >>> manipulate
>> >>> segment
>> >>> registers via a MachineFunctionPass.
>> >>>
>> >>> I have looked at LLVMs safestack implementation, but they are taking a
>> >>> fairly complicated hybrid approach between an IR Function pass with
>> >>> Backend support. I would like to stay as a single machinefunction
>> >>> pass.
>> >>>
>> >>> Believe me I would do this at the IR level if I didnt need to
>> >>> specifically use the segment registers.
>> >>>
>> >>> Thanks for the help in advance!
>> >>>
>> >>> Sincerely,
>> >>>
>> >>> Christopher Jelesnianski
>> >>> Graduate Research Assistant
>> >>> Virginia Tech
>> >>> _______________________________________________
>> >>> LLVM Developers mailing list
>> >>> llvm-dev at lists.llvm.org
>> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev