[llvm-commits] [llvm] r77407 - in /llvm/trunk: include/llvm/IntrinsicsX86.td lib/Target/X86/X86ISelLowering.cpp lib/Target/X86/X86ISelLowering.h lib/Target/X86/X86InstrSSE.td test/CodeGen/X86/sse41.ll

Chris Lattner clattner at apple.com
Tue Jul 28 21:44:50 PDT 2009


On Jul 28, 2009, at 5:28 PM, Eric Christopher wrote:

> Author: echristo
> Date: Tue Jul 28 19:28:05 2009
> New Revision: 77407
>
> URL: http://llvm.org/viewvc/llvm-project?rev=77407&view=rev
> Log:
> Add support for gcc __builtin_ia32_ptest{z,c,nzc} intrinsics. Lower
> to ptest instruction plus setcc. Revamp ptest instruction. Add test.

Nice.

> +  // ptest intrinsics. The intrinsic these come from are designed  
> to return
> +  // a boolean value, not just an instruction so lower it to the  
> ptest
> +  // pattern and a conditional move to the result.
> +  case Intrinsic::x86_sse41_ptestz:
> +  case Intrinsic::x86_sse41_ptestc:
> +  case Intrinsic::x86_sse41_ptestnzc:{
> +    unsigned X86CC = 0;
> +    switch (IntNo) {
> +    default: break;

Please make "default" abort with:
    llvm_unreachable("unknown condition");

or something.

> +// ptest instruction we'll lower to this in X86ISelLowering  
> primarily from
> +// the intel intrinsic that corresponds to this.
> let Defs = [EFLAGS] in {
> def PTESTrr : SS48I<0x17, MRMSrcReg, (outs), (ins VR128:$src1,  
> VR128:$src2),
> -                    "ptest \t{$src2, $src1|$src1, $src2}", []>,  
> OpSize;
> +                    "ptest \t{$src2, $src1|$src1, $src2}",
> +                    [(X86ptest VR128:$src1, VR128:$src2),
> +                      (implicit EFLAGS)]>, OpSize;

Do we really need "Defs = EFLAGS" *and* "implicit EFLAGS"?  It seems  
like the later one should be enough, though I don't actually know if  
that's true.  Evan or Dan would know.


Something not new in this patch, but:

def X86ptest   : SDNode<"X86ISD::PTEST", SDTX86CmpPTest>;

 From my understanding, I think that ptest is commutative (but maybe  
only for "z", but not the others?).  If unconditionally true, you can  
declare this with [SDNPCommutative] as a third argument to SDNode and  
the instruction should be figured out to be commutative.  If only true  
in some cases, more trickery will be required.

For something like this:

define i32 @test(<4 x float> %t1, <4 x float> *%t2) nounwind {
	%x = load <4x float>* %t2
         %tmp1 = call i32 @llvm.x86.sse41.ptestz(<4 x float> %x, <4 x  
float> %t1) nounwind readnone
         ret i32 %tmp1
}

declare i32 @llvm.x86.sse41.ptestz(<4 x float>, <4 x float>) nounwind  
readnone

It should allow the instruction to fold the load instead of producing:

_test:
	movl	4(%esp), %eax
	movaps	(%eax), %xmm1
	ptest 	%xmm0, %xmm1
	sete	%al
	movzbl	%al, %eax
	ret

Does "ptest" need to be added to the load folding table in case it is  
isel'd as reg/reg but an operand gets spilled (thus the regalloc  
should fold the load by forming reg/mem)?

Overall, very nice job!

-Chris



More information about the llvm-commits mailing list