[llvm-commits] [llvm] r77407 - in /llvm/trunk: include/llvm/IntrinsicsX86.td lib/Target/X86/X86ISelLowering.cpp lib/Target/X86/X86ISelLowering.h lib/Target/X86/X86InstrSSE.td test/CodeGen/X86/sse41.ll
Chris Lattner
clattner at apple.com
Tue Jul 28 21:44:50 PDT 2009
On Jul 28, 2009, at 5:28 PM, Eric Christopher wrote:
> Author: echristo
> Date: Tue Jul 28 19:28:05 2009
> New Revision: 77407
>
> URL: http://llvm.org/viewvc/llvm-project?rev=77407&view=rev
> Log:
> Add support for gcc __builtin_ia32_ptest{z,c,nzc} intrinsics. Lower
> to ptest instruction plus setcc. Revamp ptest instruction. Add test.
Nice.
> + // ptest intrinsics. The intrinsic these come from are designed
> to return
> + // a boolean value, not just an instruction so lower it to the
> ptest
> + // pattern and a conditional move to the result.
> + case Intrinsic::x86_sse41_ptestz:
> + case Intrinsic::x86_sse41_ptestc:
> + case Intrinsic::x86_sse41_ptestnzc:{
> + unsigned X86CC = 0;
> + switch (IntNo) {
> + default: break;
Please make "default" abort with:
llvm_unreachable("unknown condition");
or something.
> +// ptest instruction we'll lower to this in X86ISelLowering
> primarily from
> +// the intel intrinsic that corresponds to this.
> let Defs = [EFLAGS] in {
> def PTESTrr : SS48I<0x17, MRMSrcReg, (outs), (ins VR128:$src1,
> VR128:$src2),
> - "ptest \t{$src2, $src1|$src1, $src2}", []>,
> OpSize;
> + "ptest \t{$src2, $src1|$src1, $src2}",
> + [(X86ptest VR128:$src1, VR128:$src2),
> + (implicit EFLAGS)]>, OpSize;
Do we really need "Defs = EFLAGS" *and* "implicit EFLAGS"? It seems
like the later one should be enough, though I don't actually know if
that's true. Evan or Dan would know.
Something not new in this patch, but:
def X86ptest : SDNode<"X86ISD::PTEST", SDTX86CmpPTest>;
From my understanding, I think that ptest is commutative (but maybe
only for "z", but not the others?). If unconditionally true, you can
declare this with [SDNPCommutative] as a third argument to SDNode and
the instruction should be figured out to be commutative. If only true
in some cases, more trickery will be required.
For something like this:
define i32 @test(<4 x float> %t1, <4 x float> *%t2) nounwind {
%x = load <4x float>* %t2
%tmp1 = call i32 @llvm.x86.sse41.ptestz(<4 x float> %x, <4 x
float> %t1) nounwind readnone
ret i32 %tmp1
}
declare i32 @llvm.x86.sse41.ptestz(<4 x float>, <4 x float>) nounwind
readnone
It should allow the instruction to fold the load instead of producing:
_test:
movl 4(%esp), %eax
movaps (%eax), %xmm1
ptest %xmm0, %xmm1
sete %al
movzbl %al, %eax
ret
Does "ptest" need to be added to the load folding table in case it is
isel'd as reg/reg but an operand gets spilled (thus the regalloc
should fold the load by forming reg/mem)?
Overall, very nice job!
-Chris
More information about the llvm-commits
mailing list