[LLVMdev] AVX Status?

Fri Jun 3 14:46:52 PDT 2011

Bruno Cardoso Lopes <bruno.cardoso at gmail.com> writes:

> Hi Ralf
>
> On Wednesday, June 1, 2011, Ralf Karrenberg <Chareos at gmx.de> wrote:
>> Hi,
>>
>> The last time the AVX backend was mentioned on this list seems to be
>> from November 2010, so I would like to ask about the current status. Is
>> anybody (e.g. at Cray?) still actively working on it?
>
> I don't think so!

Yes, we are!  I am doing a lot of tuning work at the moment.  We have
been rather swamped with work for new products and I am now just getting
out from under that.  Expect to see more patches flowing in over the
next several weeks.  There's a LOT left to send up.

>> I have tried both LLVM 2.9 final and the latest trunk, and it seems like
>> some trivial stuff is already working and produces nice code for code
>> using <8 x float>.
>
> Almost everything that could be matched in tablegen files only by
> extending the 128-bit PatFrags and PatLeafs to their 256-bit
> counterparts should work, but besides that (which is where the
> interesting stuff happens) there's no support yet!

Indeed.  The bulk of the work is in shuffle generation.

We have a full implementation.  I just have to get enough time to get it
merged.  :-/

>> define <8 x float> @test2(<8 x float> %a, <8 x float> %b, <8 x i32> %m)
>> nounwind readnone {
>> entry:
>>    %cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a,
>> <8 x float> %b, i8 1) nounwind readnone
>>    %cast = bitcast <8 x float> %cmp to <8 x i32>
>>    %mask = and <8 x i32> %cast, %m
>>    %blend_cond = bitcast <8 x i32> %mask to <8 x float>
>>    %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float>
>> %a, <8 x float> %b, <8 x float> %blend_cond) nounwind readnone
>>    ret <8 x float> %res
>> }
>>
>> llc (latest trunk) bails out with:
>>
>> LLVM ERROR: Cannot select: 0x2510540: v8f32 = bitcast 0x2532270 [ID=16]
>>    0x2532270: v4i64 = and 0x2532070, 0x2532170 [ID=15]
>>      0x2532070: v4i64 = bitcast 0x2510740 [ID=14]
>>        0x2510740: v8f32 = llvm.x86.avx.cmp.ps.256 0x2510640, 0x2511340,
>> 0x2510f40, 0x2511140 [ORD=3] [ID=12]
>> ...
>>
>> The same counts for or and xor where VXORPS etc. should be selected.
>
> Please file bug reports!

It's a problem with integer code.  There are no 256-bit integer bitwise
instructions in AVX.  There are no 256-bit integer instructions period.
What's missing is the legalize code to handle this.  I have it in our
tree.

>> There seems to be some code for this because
>> xor <8 x i32> %m, %m
>> works, probably because it can get rid of all bitcasts.

And it can use xorps to implement the operation.

>> Ideally, I guess we would want code like this instead of the intrinsics
>> at some point:
>>
>> define <8 x float> @test3(<8 x float> %a, <8 x float> %b, <8 x i1> %m)
>> nounwind readnone {
>> entry:
>>    %cmp = fcmp ugt <8 x float> %a, %b
>>    %mask = and <8 x i1> %cmp, %m
>>    %res = select <8 x i1> %mask, <8 x float> %a, <8 x float> %b
>>    ret <8 x float> %res
>> }
>
> That would be nice indeed

Some lowering code would be needed to convert from i1 masks to i8 masks
(the so-called packed vs. sparse mask issue).  I don't think I've added
anything to do this as our vectorizer doesn't generate code this way.

>> -> VCMPPS, VANDPS, BLENDVPS
>>
>> Nadav Rotem sent around a patch a few weeks ago in which he implemented
>> codegen for the select for SSE, unfortunately I did not have time to
>> look at it in more depth so far.
>>
>> Can anybody comment on the current status of AVX?
>
> No codegen support yet (although some stuff works), but the assembler
> support is complete!

There's some codegen support, but it's very, very, very incomplete.

                            -Dave