[llvm-dev] Efficiently ignoring upper 32 pointer bits when dereferencing

Friedman, Eli via llvm-dev llvm-dev at lists.llvm.org
Wed Aug 2 10:47:53 PDT 2017


On 8/2/2017 9:03 AM, Taddeus via llvm-dev wrote:
> Hi all,
>
> I am experiencing a problem with the representation of addresses in 
> the x86_64 TableGen backend and was hoping someone can tell me if it 
> is fixable. Any comments or hints in to send me in the right direction 
> would be greatly appreciated. I am using  LLVM version 3.8, commit 251286.
>
>
> I have an IR pass that stores metadata in the upper 32 bits of 64-bit 
> pointers in order to implement memory safety. The pass instruments 
> loads and stores to do an AND of the address with 0xffffffff to mask 
> out that metadata. E.g., when loading a 4-byte value from memory 
> pointed to by %rbx, this translates to the following asm:
>     mov    %ecx,%ecx   ; zeroes the upper bits, removing the metadata
>     mov    (%rcx),%eax
>
> This leads to quite some overhead (12% on SPEC CPU2006) so I am 
> looking into possibilities for backend modifications to optimize this. 
> The first mov introduces unnecessary extra cycles and the second mov 
> has to wait for its results, potentially stalling the pipeline. On top 
> of that, it increases register pressure when the original pointer must 
> be preserved for later use (e.g. the mask would be "mov %esi,%ecx" 
> after which %rsi is dereferenced, instead of just dereferencing %esi).
>
> So, what I would like to generate instead is the following:
>     mov    (%ecx),%eax
> I.e., don't do the masking in a separate mov, but by using a 
> subregister for the address (which is zero-extended, effectively 
> ignoring the metadata bits). As a side note, GCC does emit the second 
> snippet as expected.
>
>
> Looking at the TableGen files I found two problems:
>
> 1. The AND of the address with 0xffffffff is replaced with 
> SUBREG_TO_REG(MOV32rr (EXTRACT_SUBREG ...)) in 
> lib/Target/X86/X86InstrCompiler.td (line 1326). That MOV32rr emits an 
> explicit mov instruction later. I think I need to replace this with 
> (i32 (EXTRACT_SUBREG ...)) to get rid of the mov, but that produces a 
> 32-bit value, which leads me to the next, more general problem:
>
> 2. The x86 backend currently does not support dereferencing 32-bit 
> addresses in 64-bit mode. Specifically, addresses are defined as an 
> iPTR type in X86InstrInfo.td which I assume is expanded to 4 or 8 
> bytes depending on if 32/64 bit mode is active:
>     def addr : ComplexPattern<iPTR, 5, "selectAddr", [], 
> [SDNPWantParent]>;
> The derefencing mov instruction looks like this:
>    def MOV32rm : I<0x8B, MRMSrcMem, (outs GR32:$dst), (ins i32mem:$src),
>         "mov{l}\t{$src, $dst|$dst, $src}",
>         [(set GR32:$dst, (loadi32 addr:$src))], IIC_MOV_MEM>, OpSize32;
> So it expects a source address of type 'addr' which is 8 bytes. This 
> leads to the following code being emitted when I apply my solution to 
> problem 1:
>      mov    (%rcx),%eax
> In other words, the upper bits are not ignored.
>
>
> I am currently not sure what is the best place to solve this problem. 
> The best would be to give the 'addr' type a dynamic size but I don't 
> know how to do this. Any ideas on this?

A TableGen pattern can only match one specific type; you'll need a 
separate pattern to match a 32-bit address.  Yes, this means you'll need 
to write your own separate pattern for every load/store instruction, but 
there isn't really any way around that.

There are some existing patterns involving MOV32rm, if you want 
inspiration; for example, the following pattern is from X86InstrCompiler.td:

def : Pat<(extloadi64i32 addr:$src),
           (SUBREG_TO_REG (i64 0), (MOV32rm addr:$src), sub_32bit)>;

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project



More information about the llvm-dev mailing list