[LLVMdev] branch on vector compare?
Roland Scheidegger
sroland at vmware.com
Mon Sep 3 18:45:02 PDT 2012
Am 04.09.2012 00:08, schrieb Stephen:
>>> which goes through memory. Is there some idiom I'm missing so that it would
> use
>>> for instance movmsk for SSE or vcmpgt & cr6 for altivec?
>>
>> I don't think you are missing anything: LLVM IR has no support for horizontal
>> operations like or'ing the elements of a vector of boolean together. The code
>> generators do try to recognize a few idioms and synthesize horizontal
>> operations from them, but I think only addition is currently recognized, and
>
> Thanks Duncan,
>
> you're right - that does compile to a mess of spills to memory not
> unlike the original.
>
> I went to have a look at this further: It seems the existing SelectInst
> is pretty close to what is needed.
> Value IRBuilder::*CreateSelect(Value *C, Value *True, Value *False,
> const Twine &Name)
> Currently, this asserts that the True & False are both vector types of
> the same size as "C". I was thinking of weakening this condition so that
> if True and False are both i1 types, it will be allowed and will result
> in something which can be branched on.
>
> I have quite a bit of reading ahead it seems!
This looks quite similar to something I filed a bug on (12312). Michael
Liao submitted fixes for this, so I think
if you change it to
%16 = fcmp ogt <4 x float> %15, %cr
%17 = sext <4 x i1> %16 to <4 x i32>
%18 = bitcast <4 x i32> %17 to i128
%19 = icmp ne i128 %18, 0
br i1 %19, label %true1, label %false2
should do the trick (one cmpps + one ptest + one br instruction).
This, however, requires sse41 which I don't know if you have - you say
the extractelements go through memory which I've never seen then again
our code didn't try to extract the i1 directly (even without fixes for
ptest the above sequence will result in only 2 extraction steps instead
of 4 if you're on x64 and the cpu supports sse41 but I guess without
sse41 and hence no pextrd/q it probably also will go through memory).
Though on altivec this sequence might not produce anything good, the
free sext requires llvm 2.7 on x86 to work at all (certainly shouldn't
be a problem nowadays but on other backends it might be different) and
for the ptest sequence very recent svn is required.
I don't think the current code can generate movmskps + test (probably
the next best thing without sse41) instead of ptest though if you only
got sse.
Roland
More information about the llvm-dev
mailing list