[llvm-commits] [llvm] r77407 - in /llvm/trunk: include/llvm/IntrinsicsX86.td lib/Target/X86/X86ISelLowering.cpp lib/Target/X86/X86ISelLowering.h lib/Target/X86/X86InstrSSE.td test/CodeGen/X86/sse41.ll
Evan Cheng
evan.cheng at apple.com
Tue Jul 28 22:42:35 PDT 2009
On Jul 28, 2009, at 9:44 PM, Chris Lattner wrote:
>
>> +// ptest instruction we'll lower to this in X86ISelLowering
>> primarily from
>> +// the intel intrinsic that corresponds to this.
>> let Defs = [EFLAGS] in {
>> def PTESTrr : SS48I<0x17, MRMSrcReg, (outs), (ins VR128:$src1,
>> VR128:$src2),
>> - "ptest \t{$src2, $src1|$src1, $src2}", []>,
>> OpSize;
>> + "ptest \t{$src2, $src1|$src1, $src2}",
>> + [(X86ptest VR128:$src1, VR128:$src2),
>> + (implicit EFLAGS)]>, OpSize;
>
> Do we really need "Defs = EFLAGS" *and* "implicit EFLAGS"? It seems
> like the later one should be enough, though I don't actually know if
> that's true. Evan or Dan would know.
Unfortunately both are needed (yes we should fix it at some point).
Defs = [EFLAGS] is saying the instruction implicitly defines EFLAGS.
(implicit EFLAGS) is telling tablegen / scheduler the instruction
produces an extra result that maps to the physical register EFLAGS.
That is the syntax we use to model physical register dependency.
Evan
>
>
> Something not new in this patch, but:
>
> def X86ptest : SDNode<"X86ISD::PTEST", SDTX86CmpPTest>;
>
> From my understanding, I think that ptest is commutative (but maybe
> only for "z", but not the others?). If unconditionally true, you can
> declare this with [SDNPCommutative] as a third argument to SDNode and
> the instruction should be figured out to be commutative. If only true
> in some cases, more trickery will be required.
>
> For something like this:
>
> define i32 @test(<4 x float> %t1, <4 x float> *%t2) nounwind {
> %x = load <4x float>* %t2
> %tmp1 = call i32 @llvm.x86.sse41.ptestz(<4 x float> %x, <4 x
> float> %t1) nounwind readnone
> ret i32 %tmp1
> }
>
> declare i32 @llvm.x86.sse41.ptestz(<4 x float>, <4 x float>) nounwind
> readnone
>
> It should allow the instruction to fold the load instead of producing:
>
> _test:
> movl 4(%esp), %eax
> movaps (%eax), %xmm1
> ptest %xmm0, %xmm1
> sete %al
> movzbl %al, %eax
> ret
>
> Does "ptest" need to be added to the load folding table in case it is
> isel'd as reg/reg but an operand gets spilled (thus the regalloc
> should fold the load by forming reg/mem)?
>
> Overall, very nice job!
>
> -Chris
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20090728/2a7d69a3/attachment.html>
More information about the llvm-commits
mailing list