[LLVMdev] branch on vector compare?

Tue Sep 4 15:24:25 PDT 2012

Roland Scheidegger <sroland <at> vmware.com> writes:
> This looks quite similar to something I filed a bug on (12312). Michael
> Liao submitted fixes for this, so I think
> if you change it to
>   %16 = fcmp ogt <4 x float> %15, %cr
>   %17 = sext <4 x i1> %16 to <4 x i32>
>   %18 = bitcast <4 x i32> %17 to i128
>   %19 = icmp ne i128 %18, 0
>   br i1 %19, label %true1, label %false2
> 
> should do the trick (one cmpps + one ptest + one br instruction).
> This, however, requires sse41 which I don't know if you have - you say
> the extractelements go through memory which I've never seen then again
> our code didn't try to extract the i1 directly (even without fixes for
> ptest the above sequence will result in only 2 extraction steps instead
> of 4 if you're on x64 and the cpu supports sse41 but I guess without
> sse41 and hence no pextrd/q it probably also will go through memory).
> Though on altivec this sequence might not produce anything good, the
> free sext requires llvm 2.7 on x86 to work at all (certainly shouldn't
> be a problem nowadays but on other backends it might be different) and
> for the ptest sequence very recent svn is required.
> I don't think the current code can generate movmskps + test (probably
> the next best thing without sse41) instead of ptest though if you only
> got sse.

Thanks Roland, sign extending gets me part of the way at least.
I'm on version 3.1 and as you say in bug report, there are a
few extraneous instructions. For the record, casting to a <4 x i8>
seems to do a better job for x86 (shuffle, movd, test, jump). Using
<4 x i32> seems to issue a pextrd for each element. For x64, it seems
to be the same for either. I suppose it's all academic seeing as the
ptest patch looks good.

Looking at it again, I'm not sure how I saw memory spills. Certainly
I can't reproduce them without using -O0. It's possible I was did
that accidentally when investigating the issue.

Thanks,
Stephen.