[llvm] Enable custom lowering of fabs_v16f16 with AVX and fabs_v32f16 with A… (PR #73565)

Tue Nov 28 09:45:02 PST 2023

david-xl wrote:

> > This is the last patch for fabs lowering. v32f16 works for AVX as well with the patch (with type legalization).
> 
> AVX512 v32f16 still looks very poor

Do you mean the sequence below?

; X86-AVX512VL-NEXT:    movl {{[0-9]+}}(%esp), %eax
; X86-AVX512VL-NEXT:    vpbroadcastw {{.*#+}} ymm0 = [NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN]
; X86-AVX512VL-NEXT:    vinserti64x4 $1, %ymm0, %zmm0, %zmm0
; X86-AVX512VL-NEXT:    vpandq (%eax), %zmm0, %zmm0

Compared with  the AVX2 version, this one materializes a 512 bit mask and do one vpandq instead of using 256bit mask with two vpandd.

If there are things to improve here, we should probably do it as a follow up. 

https://github.com/llvm/llvm-project/pull/73565