[LLVMdev] Using intrinsics with memory operands

Sun Aug 3 23:41:26 PDT 2008

Eli is correct. This is a deficiency in the matching code. We don't  
want variants of intrinsics which take memory operands. We often have  
to add code matching scalar_to_vector and / or bit_convert explicitly.  
Perhaps we should have tablegen produce matching code that check for  
these nodes.

Evan

On Aug 1, 2008, at 6:20 AM, Eli Friedman wrote:

> On Fri, Aug 1, 2008 at 12:10 AM, Nicolas Capens <nicolas at capens.net>  
> wrote:
>> I was wondering how to use variations of intrinsic functions that  
>> take a
>> memory operand.
>
> Often, for intrinsics where it matters, there's a variant of the
> intrinsic that takes a pointer operand that you can use, although it
> looks like there isn't one here.
>
>> Take for example the SSE4.1 pmovsxbd instruction. One variant takes  
>> two XMM
>> registers, while another has a 32-bit memory location as source  
>> operand. The
>> latter is quite interesting if you know you're reading from memory  
>> anyway,
>> and if it's not 16-byte aligned. It looks like LLVM's
>> Intrinsic::x86_sse41_pmovsxbd expects a v16i8 as source operand  
>> though. So
>> how do I achieve using the variant taking a memory operand?
>
> A load+insertelement+pmovsx sequence should codegen into a single
> instruction, but it looks like that isn't working.  I guess the
> pattern-matching magic should kick in and take care of this, but that
> doesn't seem to be working for a simple example like the following:
>
> target datalayout =
> "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32- 
> f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32"
> target triple = "i386-pc-linux-gnu"
>
> define <4 x i32> @a(i32* %x) nounwind  {
> entry:
> 	load i32* %x, align 4		; <i32>:0 [#uses=1]
> 	insertelement <4 x i32> undef, i32 %0, i32 0		; <<4 x i32>>:1  
> [#uses=1]
> 	bitcast <4 x i32> %1 to <16 x i8>		; <<16 x i8>>:5 [#uses=1]
> 	tail call <4 x i32> @llvm.x86.sse41.pmovsxbd( <16 x i8> %2 ) nounwind
> readnone 		; <<2 x i64>>:6 [#uses=1]
> 	ret <4 x i32> %3
> }
>
> declare <4 x i32> @llvm.x86.sse41.pmovsxbd(<16 x i8>) nounwind  
> readnone
>
> I think the issue is that the pattern for the memory operand of
> pmovsxbd isn't flexible enough to see through the scalar_to_vector
> step.
>
> -Eli
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev