[llvm-dev] Efficiently ignoring upper 32 pointer bits whendereferencing

Wed Aug 2 15:38:18 PDT 2017

Good point. Maybe the prefix can be specified next to the opcode in the pattern in X86InstrInfo.td?

Cheers,
Taddeüs

From: Craig Topper
Sent: Wednesday, 2 August 2017 23:22
To: Taddeus Kroes
Cc: Friedman, Eli; llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] Efficiently ignoring upper 32 pointer bits whendereferencing

Maybe the code emitter will just work because it detects the register size since we have to support hand written assembly.

~Craig

On Wed, Aug 2, 2017 at 2:17 PM, Craig Topper <craig.topper at gmail.com> wrote:
Getting the instruction to actually use (%ecx) as the address requires putting a 0x67 prefix on the instruction. I'm not sure how to convince X86MCCodeEmitter.cpp to do that for you. Assuming you're wanting to generate binary and not textual assembly.

~Craig

On Wed, Aug 2, 2017 at 2:03 PM, Taddeus Kroes via llvm-dev <llvm-dev at lists.llvm.org> wrote:
Hi Eli,
Thanks, I’ll look into that then!

Cheers,
Taddeüs

From: Friedman, Eli
Sent: Wednesday, 2 August 2017 19:48
To: Taddeus; llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] Efficiently ignoring upper 32 pointer bits whendereferencing

On 8/2/2017 9:03 AM, Taddeus via llvm-dev wrote:
> Hi all,
> 
> I am experiencing a problem with the representation of addresses in 
> the x86_64 TableGen backend and was hoping someone can tell me if it 
> is fixable. Any comments or hints in to send me in the right direction 
> would be greatly appreciated. I am using  LLVM version 3.8, commit 251286.
> 
> 
> I have an IR pass that stores metadata in the upper 32 bits of 64-bit 
> pointers in order to implement memory safety. The pass instruments 
> loads and stores to do an AND of the address with 0xffffffff to mask 
> out that metadata. E.g., when loading a 4-byte value from memory 
> pointed to by %rbx, this translates to the following asm:
>     mov    %ecx,%ecx   ; zeroes the upper bits, removing the metadata
>     mov    (%rcx),%eax
> 
> This leads to quite some overhead (12% on SPEC CPU2006) so I am 
> looking into possibilities for backend modifications to optimize this. 
> The first mov introduces unnecessary extra cycles and the second mov 
> has to wait for its results, potentially stalling the pipeline. On top 
> of that, it increases register pressure when the original pointer must 
> be preserved for later use (e.g. the mask would be "mov %esi,%ecx" 
> after which %rsi is dereferenced, instead of just dereferencing %esi).
> 
> So, what I would like to generate instead is the following:
>     mov    (%ecx),%eax
> I.e., don't do the masking in a separate mov, but by using a 
> subregister for the address (which is zero-extended, effectively 
> ignoring the metadata bits). As a side note, GCC does emit the second 
> snippet as expected.
> 
> 
> Looking at the TableGen files I found two problems:
> 
> 1. The AND of the address with 0xffffffff is replaced with 
> SUBREG_TO_REG(MOV32rr (EXTRACT_SUBREG ...)) in 
> lib/Target/X86/X86InstrCompiler.td (line 1326). That MOV32rr emits an 
> explicit mov instruction later. I think I need to replace this with 
> (i32 (EXTRACT_SUBREG ...)) to get rid of the mov, but that produces a 
> 32-bit value, which leads me to the next, more general problem:
> 
> 2. The x86 backend currently does not support dereferencing 32-bit 
> addresses in 64-bit mode. Specifically, addresses are defined as an 
> iPTR type in X86InstrInfo.td which I assume is expanded to 4 or 8 
> bytes depending on if 32/64 bit mode is active:
>     def addr : ComplexPattern<iPTR, 5, "selectAddr", [], 
> [SDNPWantParent]>;
> The derefencing mov instruction looks like this:
>    def MOV32rm : I<0x8B, MRMSrcMem, (outs GR32:$dst), (ins i32mem:$src),
>         "mov{l}\t{$src, $dst|$dst, $src}",
>         [(set GR32:$dst, (loadi32 addr:$src))], IIC_MOV_MEM>, OpSize32;
> So it expects a source address of type 'addr' which is 8 bytes. This 
> leads to the following code being emitted when I apply my solution to 
> problem 1:
>      mov    (%rcx),%eax
> In other words, the upper bits are not ignored.
> 
> 
> I am currently not sure what is the best place to solve this problem. 
> The best would be to give the 'addr' type a dynamic size but I don't 
> know how to do this. Any ideas on this?

A TableGen pattern can only match one specific type; you'll need a 
separate pattern to match a 32-bit address.  Yes, this means you'll need 
to write your own separate pattern for every load/store instruction, but 
there isn't really any way around that.

There are some existing patterns involving MOV32rm, if you want 
inspiration; for example, the following pattern is from X86InstrCompiler.td:

def : Pat<(extloadi64i32 addr:$src),
           (SUBREG_TO_REG (i64 0), (MOV32rm addr:$src), sub_32bit)>;

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170803/e9f58e74/attachment.html>