[llvm-dev] Efficiently ignoring upper 32 pointer bits whendereferencing

Craig Topper via llvm-dev llvm-dev at lists.llvm.org
Wed Aug 2 14:22:08 PDT 2017


Maybe the code emitter will just work because it detects the register size
since we have to support hand written assembly.

~Craig

On Wed, Aug 2, 2017 at 2:17 PM, Craig Topper <craig.topper at gmail.com> wrote:

> Getting the instruction to actually use (%ecx) as the address requires
> putting a 0x67 prefix on the instruction. I'm not sure how to convince
> X86MCCodeEmitter.cpp to do that for you. Assuming you're wanting to
> generate binary and not textual assembly.
>
> ~Craig
>
> On Wed, Aug 2, 2017 at 2:03 PM, Taddeus Kroes via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi Eli,
>>
>> Thanks, I’ll look into that then!
>>
>>
>>
>> Cheers,
>>
>> Taddeüs
>>
>>
>>
>> *From: *Friedman, Eli <efriedma at codeaurora.org>
>> *Sent: *Wednesday, 2 August 2017 19:48
>> *To: *Taddeus <t.kroes at vu.nl>; llvm-dev at lists.llvm.org
>> *Subject: *Re: [llvm-dev] Efficiently ignoring upper 32 pointer bits
>> whendereferencing
>>
>>
>>
>> On 8/2/2017 9:03 AM, Taddeus via llvm-dev wrote:
>>
>> > Hi all,
>>
>> >
>>
>> > I am experiencing a problem with the representation of addresses in
>>
>> > the x86_64 TableGen backend and was hoping someone can tell me if it
>>
>> > is fixable. Any comments or hints in to send me in the right direction
>>
>> > would be greatly appreciated. I am using  LLVM version 3.8, commit
>> 251286.
>>
>> >
>>
>> >
>>
>> > I have an IR pass that stores metadata in the upper 32 bits of 64-bit
>>
>> > pointers in order to implement memory safety. The pass instruments
>>
>> > loads and stores to do an AND of the address with 0xffffffff to mask
>>
>> > out that metadata. E.g., when loading a 4-byte value from memory
>>
>> > pointed to by %rbx, this translates to the following asm:
>>
>> >     mov    %ecx,%ecx   ; zeroes the upper bits, removing the metadata
>>
>> >     mov    (%rcx),%eax
>>
>> >
>>
>> > This leads to quite some overhead (12% on SPEC CPU2006) so I am
>>
>> > looking into possibilities for backend modifications to optimize this.
>>
>> > The first mov introduces unnecessary extra cycles and the second mov
>>
>> > has to wait for its results, potentially stalling the pipeline. On top
>>
>> > of that, it increases register pressure when the original pointer must
>>
>> > be preserved for later use (e.g. the mask would be "mov %esi,%ecx"
>>
>> > after which %rsi is dereferenced, instead of just dereferencing %esi).
>>
>> >
>>
>> > So, what I would like to generate instead is the following:
>>
>> >     mov    (%ecx),%eax
>>
>> > I.e., don't do the masking in a separate mov, but by using a
>>
>> > subregister for the address (which is zero-extended, effectively
>>
>> > ignoring the metadata bits). As a side note, GCC does emit the second
>>
>> > snippet as expected.
>>
>> >
>>
>> >
>>
>> > Looking at the TableGen files I found two problems:
>>
>> >
>>
>> > 1. The AND of the address with 0xffffffff is replaced with
>>
>> > SUBREG_TO_REG(MOV32rr (EXTRACT_SUBREG ...)) in
>>
>> > lib/Target/X86/X86InstrCompiler.td (line 1326). That MOV32rr emits an
>>
>> > explicit mov instruction later. I think I need to replace this with
>>
>> > (i32 (EXTRACT_SUBREG ...)) to get rid of the mov, but that produces a
>>
>> > 32-bit value, which leads me to the next, more general problem:
>>
>> >
>>
>> > 2. The x86 backend currently does not support dereferencing 32-bit
>>
>> > addresses in 64-bit mode. Specifically, addresses are defined as an
>>
>> > iPTR type in X86InstrInfo.td which I assume is expanded to 4 or 8
>>
>> > bytes depending on if 32/64 bit mode is active:
>>
>> >     def addr : ComplexPattern<iPTR, 5, "selectAddr", [],
>>
>> > [SDNPWantParent]>;
>>
>> > The derefencing mov instruction looks like this:
>>
>> >    def MOV32rm : I<0x8B, MRMSrcMem, (outs GR32:$dst), (ins i32mem:$src),
>>
>> >         "mov{l}\t{$src, $dst|$dst, $src}",
>>
>> >         [(set GR32:$dst, (loadi32 addr:$src))], IIC_MOV_MEM>, OpSize32;
>>
>> > So it expects a source address of type 'addr' which is 8 bytes. This
>>
>> > leads to the following code being emitted when I apply my solution to
>>
>> > problem 1:
>>
>> >      mov    (%rcx),%eax
>>
>> > In other words, the upper bits are not ignored.
>>
>> >
>>
>> >
>>
>> > I am currently not sure what is the best place to solve this problem.
>>
>> > The best would be to give the 'addr' type a dynamic size but I don't
>>
>> > know how to do this. Any ideas on this?
>>
>>
>>
>> A TableGen pattern can only match one specific type; you'll need a
>>
>> separate pattern to match a 32-bit address.  Yes, this means you'll need
>>
>> to write your own separate pattern for every load/store instruction, but
>>
>> there isn't really any way around that.
>>
>>
>>
>> There are some existing patterns involving MOV32rm, if you want
>>
>> inspiration; for example, the following pattern is from
>> X86InstrCompiler.td:
>>
>>
>>
>> def : Pat<(extloadi64i32 addr:$src),
>>
>>            (SUBREG_TO_REG (i64 0), (MOV32rm addr:$src), sub_32bit)>;
>>
>>
>>
>> -Eli
>>
>>
>>
>> --
>>
>> Employee of Qualcomm Innovation Center, Inc.
>>
>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a
>> Linux Foundation Collaborative Project
>>
>>
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170802/ac373156/attachment.html>


More information about the llvm-dev mailing list