[LLVMdev] Using intrinsics with memory operands

Fri Aug 1 06:20:29 PDT 2008

On Fri, Aug 1, 2008 at 12:10 AM, Nicolas Capens <nicolas at capens.net> wrote:
> I was wondering how to use variations of intrinsic functions that take a
> memory operand.

Often, for intrinsics where it matters, there's a variant of the
intrinsic that takes a pointer operand that you can use, although it
looks like there isn't one here.

> Take for example the SSE4.1 pmovsxbd instruction. One variant takes two XMM
> registers, while another has a 32-bit memory location as source operand. The
> latter is quite interesting if you know you're reading from memory anyway,
> and if it's not 16-byte aligned. It looks like LLVM's
> Intrinsic::x86_sse41_pmovsxbd expects a v16i8 as source operand though. So
> how do I achieve using the variant taking a memory operand?

A load+insertelement+pmovsx sequence should codegen into a single
instruction, but it looks like that isn't working.  I guess the
pattern-matching magic should kick in and take care of this, but that
doesn't seem to be working for a simple example like the following:

target datalayout =
"e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32"
target triple = "i386-pc-linux-gnu"

define <4 x i32> @a(i32* %x) nounwind  {
entry:
	load i32* %x, align 4		; <i32>:0 [#uses=1]
	insertelement <4 x i32> undef, i32 %0, i32 0		; <<4 x i32>>:1 [#uses=1]
	bitcast <4 x i32> %1 to <16 x i8>		; <<16 x i8>>:5 [#uses=1]
	tail call <4 x i32> @llvm.x86.sse41.pmovsxbd( <16 x i8> %2 ) nounwind
readnone 		; <<2 x i64>>:6 [#uses=1]
	ret <4 x i32> %3
}

declare <4 x i32> @llvm.x86.sse41.pmovsxbd(<16 x i8>) nounwind readnone

I think the issue is that the pattern for the memory operand of
pmovsxbd isn't flexible enough to see through the scalar_to_vector
step.

-Eli