[LLVMdev] AVX Status?

Thu Jun 2 09:27:13 PDT 2011

Hello Ralf,

Chris said AVX backend is not yet mature.

http://www.mail-archive.com/llvmbugs@cs.uiuc.edu/msg12442.html

I am also interested in AVX codegen backend and trying to write a
patch to fix current unusable AVX codegen.
I have just tried to submit a patch to fix fpextend(VCVTSS2SD) and
sitofp(VCVTSI2SD) codegen, and its in reviewing phase.

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20110530/121689.html

It'd be definitely welcome for AVX committers, but at this time no one
is actively working AVX backend.

I am trying to write a AVX patch as much as possible(at least to run
my AVX code correctly), but my time is very limited, so I hope someone
else would actively work on AVX backend...

On Wed, Jun 1, 2011 at 9:52 PM, Ralf Karrenberg <Chareos at gmx.de> wrote:
> Hi,
>
> The last time the AVX backend was mentioned on this list seems to be
> from November 2010, so I would like to ask about the current status. Is
> anybody (e.g. at Cray?) still actively working on it?
>
> I have tried both LLVM 2.9 final and the latest trunk, and it seems like
> some trivial stuff is already working and produces nice code for code
> using <8 x float>.
> Unfortunately, the backend gets confused about mask code as e.g.
> produced by VCMPPS together with mask operations (which LLVM requires to
> work on <8 x i32> atm) and corresponding bitcasts.
>
> Consider these two examples:
>
> define <8 x float> @test1(<8 x float> %a, <8 x float> %b, <8 x i32> %m)
> nounwind readnone {
> entry:
>   %cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a,
> <8 x float> %b, i8 1) nounwind readnone
>   %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float>
> %a, <8 x float> %b, <8 x float> %cmp) nounwind readnone
>   ret <8 x float> %res
> }
>
> This works fine and produces the expected assembly (VCMPLTPS + VBLENDVPS).
>
> On the other hand, this does not work:
>
> define <8 x float> @test2(<8 x float> %a, <8 x float> %b, <8 x i32> %m)
> nounwind readnone {
> entry:
>   %cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a,
> <8 x float> %b, i8 1) nounwind readnone
>   %cast = bitcast <8 x float> %cmp to <8 x i32>
>   %mask = and <8 x i32> %cast, %m
>   %blend_cond = bitcast <8 x i32> %mask to <8 x float>
>   %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float>
> %a, <8 x float> %b, <8 x float> %blend_cond) nounwind readnone
>   ret <8 x float> %res
> }
>
> llc (latest trunk) bails out with:
>
> LLVM ERROR: Cannot select: 0x2510540: v8f32 = bitcast 0x2532270 [ID=16]
>   0x2532270: v4i64 = and 0x2532070, 0x2532170 [ID=15]
>     0x2532070: v4i64 = bitcast 0x2510740 [ID=14]
>       0x2510740: v8f32 = llvm.x86.avx.cmp.ps.256 0x2510640, 0x2511340,
> 0x2510f40, 0x2511140 [ORD=3] [ID=12]
> ...
>
> The same counts for or and xor where VXORPS etc. should be selected.
> There seems to be some code for this because
> xor <8 x i32> %m, %m
> works, probably because it can get rid of all bitcasts.
>
> Ideally, I guess we would want code like this instead of the intrinsics
> at some point:
>
> define <8 x float> @test3(<8 x float> %a, <8 x float> %b, <8 x i1> %m)
> nounwind readnone {
> entry:
>   %cmp = fcmp ugt <8 x float> %a, %b
>   %mask = and <8 x i1> %cmp, %m
>   %res = select <8 x i1> %mask, <8 x float> %a, <8 x float> %b
>   ret <8 x float> %res
> }
>
> -> VCMPPS, VANDPS, BLENDVPS
>
> Nadav Rotem sent around a patch a few weeks ago in which he implemented
> codegen for the select for SSE, unfortunately I did not have time to
> look at it in more depth so far.
>
> Can anybody comment on the current status of AVX?
>
> Best,
> Ralf
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>