[LLVMdev] Using intrinsics with memory operands
Eli Friedman
eli.friedman at gmail.com
Fri Aug 1 06:20:29 PDT 2008
On Fri, Aug 1, 2008 at 12:10 AM, Nicolas Capens <nicolas at capens.net> wrote:
> I was wondering how to use variations of intrinsic functions that take a
> memory operand.
Often, for intrinsics where it matters, there's a variant of the
intrinsic that takes a pointer operand that you can use, although it
looks like there isn't one here.
> Take for example the SSE4.1 pmovsxbd instruction. One variant takes two XMM
> registers, while another has a 32-bit memory location as source operand. The
> latter is quite interesting if you know you're reading from memory anyway,
> and if it's not 16-byte aligned. It looks like LLVM's
> Intrinsic::x86_sse41_pmovsxbd expects a v16i8 as source operand though. So
> how do I achieve using the variant taking a memory operand?
A load+insertelement+pmovsx sequence should codegen into a single
instruction, but it looks like that isn't working. I guess the
pattern-matching magic should kick in and take care of this, but that
doesn't seem to be working for a simple example like the following:
target datalayout =
"e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32"
target triple = "i386-pc-linux-gnu"
define <4 x i32> @a(i32* %x) nounwind {
entry:
load i32* %x, align 4 ; <i32>:0 [#uses=1]
insertelement <4 x i32> undef, i32 %0, i32 0 ; <<4 x i32>>:1 [#uses=1]
bitcast <4 x i32> %1 to <16 x i8> ; <<16 x i8>>:5 [#uses=1]
tail call <4 x i32> @llvm.x86.sse41.pmovsxbd( <16 x i8> %2 ) nounwind
readnone ; <<2 x i64>>:6 [#uses=1]
ret <4 x i32> %3
}
declare <4 x i32> @llvm.x86.sse41.pmovsxbd(<16 x i8>) nounwind readnone
I think the issue is that the pattern for the memory operand of
pmovsxbd isn't flexible enough to see through the scalar_to_vector
step.
-Eli
More information about the llvm-dev
mailing list