[llvm-dev] Efficiently ignoring upper 32 pointer bits when dereferencing
Friedman, Eli via llvm-dev
llvm-dev at lists.llvm.org
Wed Aug 2 10:47:53 PDT 2017
On 8/2/2017 9:03 AM, Taddeus via llvm-dev wrote:
> Hi all,
>
> I am experiencing a problem with the representation of addresses in
> the x86_64 TableGen backend and was hoping someone can tell me if it
> is fixable. Any comments or hints in to send me in the right direction
> would be greatly appreciated. I am using LLVM version 3.8, commit 251286.
>
>
> I have an IR pass that stores metadata in the upper 32 bits of 64-bit
> pointers in order to implement memory safety. The pass instruments
> loads and stores to do an AND of the address with 0xffffffff to mask
> out that metadata. E.g., when loading a 4-byte value from memory
> pointed to by %rbx, this translates to the following asm:
> mov %ecx,%ecx ; zeroes the upper bits, removing the metadata
> mov (%rcx),%eax
>
> This leads to quite some overhead (12% on SPEC CPU2006) so I am
> looking into possibilities for backend modifications to optimize this.
> The first mov introduces unnecessary extra cycles and the second mov
> has to wait for its results, potentially stalling the pipeline. On top
> of that, it increases register pressure when the original pointer must
> be preserved for later use (e.g. the mask would be "mov %esi,%ecx"
> after which %rsi is dereferenced, instead of just dereferencing %esi).
>
> So, what I would like to generate instead is the following:
> mov (%ecx),%eax
> I.e., don't do the masking in a separate mov, but by using a
> subregister for the address (which is zero-extended, effectively
> ignoring the metadata bits). As a side note, GCC does emit the second
> snippet as expected.
>
>
> Looking at the TableGen files I found two problems:
>
> 1. The AND of the address with 0xffffffff is replaced with
> SUBREG_TO_REG(MOV32rr (EXTRACT_SUBREG ...)) in
> lib/Target/X86/X86InstrCompiler.td (line 1326). That MOV32rr emits an
> explicit mov instruction later. I think I need to replace this with
> (i32 (EXTRACT_SUBREG ...)) to get rid of the mov, but that produces a
> 32-bit value, which leads me to the next, more general problem:
>
> 2. The x86 backend currently does not support dereferencing 32-bit
> addresses in 64-bit mode. Specifically, addresses are defined as an
> iPTR type in X86InstrInfo.td which I assume is expanded to 4 or 8
> bytes depending on if 32/64 bit mode is active:
> def addr : ComplexPattern<iPTR, 5, "selectAddr", [],
> [SDNPWantParent]>;
> The derefencing mov instruction looks like this:
> def MOV32rm : I<0x8B, MRMSrcMem, (outs GR32:$dst), (ins i32mem:$src),
> "mov{l}\t{$src, $dst|$dst, $src}",
> [(set GR32:$dst, (loadi32 addr:$src))], IIC_MOV_MEM>, OpSize32;
> So it expects a source address of type 'addr' which is 8 bytes. This
> leads to the following code being emitted when I apply my solution to
> problem 1:
> mov (%rcx),%eax
> In other words, the upper bits are not ignored.
>
>
> I am currently not sure what is the best place to solve this problem.
> The best would be to give the 'addr' type a dynamic size but I don't
> know how to do this. Any ideas on this?
A TableGen pattern can only match one specific type; you'll need a
separate pattern to match a 32-bit address. Yes, this means you'll need
to write your own separate pattern for every load/store instruction, but
there isn't really any way around that.
There are some existing patterns involving MOV32rm, if you want
inspiration; for example, the following pattern is from X86InstrCompiler.td:
def : Pat<(extloadi64i32 addr:$src),
(SUBREG_TO_REG (i64 0), (MOV32rm addr:$src), sub_32bit)>;
-Eli
--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
More information about the llvm-dev
mailing list