[llvm-commits] [llvm] r77407 - in /llvm/trunk: include/llvm/IntrinsicsX86.td lib/Target/X86/X86ISelLowering.cpp lib/Target/X86/X86ISelLowering.h lib/Target/X86/X86InstrSSE.td test/CodeGen/X86/sse41.ll

Evan Cheng evan.cheng at apple.com
Tue Jul 28 22:42:35 PDT 2009


On Jul 28, 2009, at 9:44 PM, Chris Lattner wrote:
>
>> +// ptest instruction we'll lower to this in X86ISelLowering
>> primarily from
>> +// the intel intrinsic that corresponds to this.
>> let Defs = [EFLAGS] in {
>> def PTESTrr : SS48I<0x17, MRMSrcReg, (outs), (ins VR128:$src1,
>> VR128:$src2),
>> -                    "ptest \t{$src2, $src1|$src1, $src2}", []>,
>> OpSize;
>> +                    "ptest \t{$src2, $src1|$src1, $src2}",
>> +                    [(X86ptest VR128:$src1, VR128:$src2),
>> +                      (implicit EFLAGS)]>, OpSize;
>
> Do we really need "Defs = EFLAGS" *and* "implicit EFLAGS"?  It seems
> like the later one should be enough, though I don't actually know if
> that's true.  Evan or Dan would know.

Unfortunately both are needed (yes we should fix it at some point).  
Defs = [EFLAGS] is saying the instruction implicitly defines EFLAGS.  
(implicit EFLAGS) is telling tablegen / scheduler the instruction  
produces an extra result that maps to the physical register EFLAGS.  
That is the syntax we use to model physical register dependency.

Evan

>
>
> Something not new in this patch, but:
>
> def X86ptest   : SDNode<"X86ISD::PTEST", SDTX86CmpPTest>;
>
> From my understanding, I think that ptest is commutative (but maybe
> only for "z", but not the others?).  If unconditionally true, you can
> declare this with [SDNPCommutative] as a third argument to SDNode and
> the instruction should be figured out to be commutative.  If only true
> in some cases, more trickery will be required.
>
> For something like this:
>
> define i32 @test(<4 x float> %t1, <4 x float> *%t2) nounwind {
> 	%x = load <4x float>* %t2
>         %tmp1 = call i32 @llvm.x86.sse41.ptestz(<4 x float> %x, <4 x
> float> %t1) nounwind readnone
>         ret i32 %tmp1
> }
>
> declare i32 @llvm.x86.sse41.ptestz(<4 x float>, <4 x float>) nounwind
> readnone
>
> It should allow the instruction to fold the load instead of producing:
>
> _test:
> 	movl	4(%esp), %eax
> 	movaps	(%eax), %xmm1
> 	ptest 	%xmm0, %xmm1
> 	sete	%al
> 	movzbl	%al, %eax
> 	ret
>
> Does "ptest" need to be added to the load folding table in case it is
> isel'd as reg/reg but an operand gets spilled (thus the regalloc
> should fold the load by forming reg/mem)?
>
> Overall, very nice job!
>
> -Chris
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20090728/2a7d69a3/attachment.html>


More information about the llvm-commits mailing list