[LLVMdev] AVX Status?

Wed Jun 1 05:52:29 PDT 2011

Hi,

The last time the AVX backend was mentioned on this list seems to be 
from November 2010, so I would like to ask about the current status. Is 
anybody (e.g. at Cray?) still actively working on it?

I have tried both LLVM 2.9 final and the latest trunk, and it seems like 
some trivial stuff is already working and produces nice code for code 
using <8 x float>.
Unfortunately, the backend gets confused about mask code as e.g. 
produced by VCMPPS together with mask operations (which LLVM requires to 
work on <8 x i32> atm) and corresponding bitcasts.

Consider these two examples:

define <8 x float> @test1(<8 x float> %a, <8 x float> %b, <8 x i32> %m) 
nounwind readnone {
entry:
   %cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a, 
<8 x float> %b, i8 1) nounwind readnone
   %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float> 
%a, <8 x float> %b, <8 x float> %cmp) nounwind readnone
   ret <8 x float> %res
}

This works fine and produces the expected assembly (VCMPLTPS + VBLENDVPS).

On the other hand, this does not work:

define <8 x float> @test2(<8 x float> %a, <8 x float> %b, <8 x i32> %m) 
nounwind readnone {
entry:
   %cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a, 
<8 x float> %b, i8 1) nounwind readnone
   %cast = bitcast <8 x float> %cmp to <8 x i32>
   %mask = and <8 x i32> %cast, %m
   %blend_cond = bitcast <8 x i32> %mask to <8 x float>
   %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float> 
%a, <8 x float> %b, <8 x float> %blend_cond) nounwind readnone
   ret <8 x float> %res
}

llc (latest trunk) bails out with:

LLVM ERROR: Cannot select: 0x2510540: v8f32 = bitcast 0x2532270 [ID=16]
   0x2532270: v4i64 = and 0x2532070, 0x2532170 [ID=15]
     0x2532070: v4i64 = bitcast 0x2510740 [ID=14]
       0x2510740: v8f32 = llvm.x86.avx.cmp.ps.256 0x2510640, 0x2511340, 
0x2510f40, 0x2511140 [ORD=3] [ID=12]
...

The same counts for or and xor where VXORPS etc. should be selected. 
There seems to be some code for this because
xor <8 x i32> %m, %m
works, probably because it can get rid of all bitcasts.

Ideally, I guess we would want code like this instead of the intrinsics 
at some point:

define <8 x float> @test3(<8 x float> %a, <8 x float> %b, <8 x i1> %m) 
nounwind readnone {
entry:
   %cmp = fcmp ugt <8 x float> %a, %b
   %mask = and <8 x i1> %cmp, %m
   %res = select <8 x i1> %mask, <8 x float> %a, <8 x float> %b
   ret <8 x float> %res
}

-> VCMPPS, VANDPS, BLENDVPS

Nadav Rotem sent around a patch a few weeks ago in which he implemented 
codegen for the select for SSE, unfortunately I did not have time to 
look at it in more depth so far.

Can anybody comment on the current status of AVX?

Best,
Ralf