[LLVMdev] Using intrinsics with memory operands
Evan Cheng
evan.cheng at apple.com
Sun Aug 3 23:41:26 PDT 2008
Eli is correct. This is a deficiency in the matching code. We don't
want variants of intrinsics which take memory operands. We often have
to add code matching scalar_to_vector and / or bit_convert explicitly.
Perhaps we should have tablegen produce matching code that check for
these nodes.
Evan
On Aug 1, 2008, at 6:20 AM, Eli Friedman wrote:
> On Fri, Aug 1, 2008 at 12:10 AM, Nicolas Capens <nicolas at capens.net>
> wrote:
>> I was wondering how to use variations of intrinsic functions that
>> take a
>> memory operand.
>
> Often, for intrinsics where it matters, there's a variant of the
> intrinsic that takes a pointer operand that you can use, although it
> looks like there isn't one here.
>
>> Take for example the SSE4.1 pmovsxbd instruction. One variant takes
>> two XMM
>> registers, while another has a 32-bit memory location as source
>> operand. The
>> latter is quite interesting if you know you're reading from memory
>> anyway,
>> and if it's not 16-byte aligned. It looks like LLVM's
>> Intrinsic::x86_sse41_pmovsxbd expects a v16i8 as source operand
>> though. So
>> how do I achieve using the variant taking a memory operand?
>
> A load+insertelement+pmovsx sequence should codegen into a single
> instruction, but it looks like that isn't working. I guess the
> pattern-matching magic should kick in and take care of this, but that
> doesn't seem to be working for a simple example like the following:
>
> target datalayout =
> "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-
> f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32"
> target triple = "i386-pc-linux-gnu"
>
> define <4 x i32> @a(i32* %x) nounwind {
> entry:
> load i32* %x, align 4 ; <i32>:0 [#uses=1]
> insertelement <4 x i32> undef, i32 %0, i32 0 ; <<4 x i32>>:1
> [#uses=1]
> bitcast <4 x i32> %1 to <16 x i8> ; <<16 x i8>>:5 [#uses=1]
> tail call <4 x i32> @llvm.x86.sse41.pmovsxbd( <16 x i8> %2 ) nounwind
> readnone ; <<2 x i64>>:6 [#uses=1]
> ret <4 x i32> %3
> }
>
> declare <4 x i32> @llvm.x86.sse41.pmovsxbd(<16 x i8>) nounwind
> readnone
>
> I think the issue is that the pattern for the memory operand of
> pmovsxbd isn't flexible enough to see through the scalar_to_vector
> step.
>
> -Eli
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
More information about the llvm-dev
mailing list