[PATCH] D14909: [X86][FMA] Optimize FNEG(FMUL) Patterns

Wed Nov 25 08:59:16 PST 2015

spatel added a comment.

In http://reviews.llvm.org/D14909#296372, @delena wrote:

> AVX512 always contains FMA, you don't need a special check.

I see what's going on now. This patch needs the AVX512 check in order to work for 512-bit vectors. There should be at least one more test case included with this patch to check the 512-bit vector case. Alternatively, if we want to make 512-bit optimization a separate patch, the predicate should disallow those types.

So I see 2 bugs here. Test case:

  define <16 x float> @test_v4f32_fneg_fmul_16(<16 x float> %x, <16 x float> %y) #0 {
    %m = fmul <16 x float> %x, %y
    %n = fsub <16 x float> <float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0>, %m
    ret <16 x float> %n
  }

Bug #1: The AVX-512 attribute in the subtarget should imply hasFMA(). Currently, this isn't true, so if you don't explicitly specify FMA, you won't get any expected FMA transforms:

  $ ./llc fma.ll -o - -mattr=avx512f
  ...
     vmulps	%zmm1, %zmm0, %zmm0
     vpxord	LCPI0_0(%rip), %zmm0, %zmm0
     retq

  $ ./llc fma.ll -o - -mattr=avx512f,fma
  ...
    vpxord	%zmm2, %zmm2, %zmm2
    vfnmadd213ps	%zmm2, %zmm1, %zmm0
    retq

Bug #2: This patch isn't checking for legal types, so it will fire for a 512-bit vector on a target that doesn't support it, and we crash:

  $ ./llc fma.ll -o - -mattr=fma
    Do not know how to custom type legalize this operation!
    UNREACHABLE executed at /Users/spatel/myllvm/llvm/lib/Target/X86/X86ISelLowering.cpp:19880!

This would also happen if the test case had an illegal type for any target; eg <17 x float>.

I think the best fix for #2 is to explicitly check for the supported types for a given subtarget rather than waiting for type legalization and then checking the scalar type. This should be a helper function, so we can have the existing code in PerformFMACombine() use it too.

Repository:
  rL LLVM

http://reviews.llvm.org/D14909