[llvm-dev] Efficiently ignoring upper 32 pointer bits whendereferencing

Wed Aug 2 14:17:40 PDT 2017

Getting the instruction to actually use (%ecx) as the address requires
putting a 0x67 prefix on the instruction. I'm not sure how to convince
X86MCCodeEmitter.cpp to do that for you. Assuming you're wanting to
generate binary and not textual assembly.

~Craig

On Wed, Aug 2, 2017 at 2:03 PM, Taddeus Kroes via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hi Eli,
>
> Thanks, I’ll look into that then!
>
>
>
> Cheers,
>
> Taddeüs
>
>
>
> *From: *Friedman, Eli <efriedma at codeaurora.org>
> *Sent: *Wednesday, 2 August 2017 19:48
> *To: *Taddeus <t.kroes at vu.nl>; llvm-dev at lists.llvm.org
> *Subject: *Re: [llvm-dev] Efficiently ignoring upper 32 pointer bits
> whendereferencing
>
>
>
> On 8/2/2017 9:03 AM, Taddeus via llvm-dev wrote:
>
> > Hi all,
>
> >
>
> > I am experiencing a problem with the representation of addresses in
>
> > the x86_64 TableGen backend and was hoping someone can tell me if it
>
> > is fixable. Any comments or hints in to send me in the right direction
>
> > would be greatly appreciated. I am using  LLVM version 3.8, commit
> 251286.
>
> >
>
> >
>
> > I have an IR pass that stores metadata in the upper 32 bits of 64-bit
>
> > pointers in order to implement memory safety. The pass instruments
>
> > loads and stores to do an AND of the address with 0xffffffff to mask
>
> > out that metadata. E.g., when loading a 4-byte value from memory
>
> > pointed to by %rbx, this translates to the following asm:
>
> >     mov    %ecx,%ecx   ; zeroes the upper bits, removing the metadata
>
> >     mov    (%rcx),%eax
>
> >
>
> > This leads to quite some overhead (12% on SPEC CPU2006) so I am
>
> > looking into possibilities for backend modifications to optimize this.
>
> > The first mov introduces unnecessary extra cycles and the second mov
>
> > has to wait for its results, potentially stalling the pipeline. On top
>
> > of that, it increases register pressure when the original pointer must
>
> > be preserved for later use (e.g. the mask would be "mov %esi,%ecx"
>
> > after which %rsi is dereferenced, instead of just dereferencing %esi).
>
> >
>
> > So, what I would like to generate instead is the following:
>
> >     mov    (%ecx),%eax
>
> > I.e., don't do the masking in a separate mov, but by using a
>
> > subregister for the address (which is zero-extended, effectively
>
> > ignoring the metadata bits). As a side note, GCC does emit the second
>
> > snippet as expected.
>
> >
>
> >
>
> > Looking at the TableGen files I found two problems:
>
> >
>
> > 1. The AND of the address with 0xffffffff is replaced with
>
> > SUBREG_TO_REG(MOV32rr (EXTRACT_SUBREG ...)) in
>
> > lib/Target/X86/X86InstrCompiler.td (line 1326). That MOV32rr emits an
>
> > explicit mov instruction later. I think I need to replace this with
>
> > (i32 (EXTRACT_SUBREG ...)) to get rid of the mov, but that produces a
>
> > 32-bit value, which leads me to the next, more general problem:
>
> >
>
> > 2. The x86 backend currently does not support dereferencing 32-bit
>
> > addresses in 64-bit mode. Specifically, addresses are defined as an
>
> > iPTR type in X86InstrInfo.td which I assume is expanded to 4 or 8
>
> > bytes depending on if 32/64 bit mode is active:
>
> >     def addr : ComplexPattern<iPTR, 5, "selectAddr", [],
>
> > [SDNPWantParent]>;
>
> > The derefencing mov instruction looks like this:
>
> >    def MOV32rm : I<0x8B, MRMSrcMem, (outs GR32:$dst), (ins i32mem:$src),
>
> >         "mov{l}\t{$src, $dst|$dst, $src}",
>
> >         [(set GR32:$dst, (loadi32 addr:$src))], IIC_MOV_MEM>, OpSize32;
>
> > So it expects a source address of type 'addr' which is 8 bytes. This
>
> > leads to the following code being emitted when I apply my solution to
>
> > problem 1:
>
> >      mov    (%rcx),%eax
>
> > In other words, the upper bits are not ignored.
>
> >
>
> >
>
> > I am currently not sure what is the best place to solve this problem.
>
> > The best would be to give the 'addr' type a dynamic size but I don't
>
> > know how to do this. Any ideas on this?
>
>
>
> A TableGen pattern can only match one specific type; you'll need a
>
> separate pattern to match a 32-bit address.  Yes, this means you'll need
>
> to write your own separate pattern for every load/store instruction, but
>
> there isn't really any way around that.
>
>
>
> There are some existing patterns involving MOV32rm, if you want
>
> inspiration; for example, the following pattern is from
> X86InstrCompiler.td:
>
>
>
> def : Pat<(extloadi64i32 addr:$src),
>
>            (SUBREG_TO_REG (i64 0), (MOV32rm addr:$src), sub_32bit)>;
>
>
>
> -Eli
>
>
>
> --
>
> Employee of Qualcomm Innovation Center, Inc.
>
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
> Foundation Collaborative Project
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170802/dba579f7/attachment.html>