[llvm] Enable custom lowering of fabs_v16f16 with AVX and fabs_v32f16 with A… (PR #73565)
David Li via llvm-commits
llvm-commits at lists.llvm.org
Tue Nov 28 09:45:02 PST 2023
david-xl wrote:
> > This is the last patch for fabs lowering. v32f16 works for AVX as well with the patch (with type legalization).
>
> AVX512 v32f16 still looks very poor
Do you mean the sequence below?
; X86-AVX512VL-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-AVX512VL-NEXT: vpbroadcastw {{.*#+}} ymm0 = [NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN]
; X86-AVX512VL-NEXT: vinserti64x4 $1, %ymm0, %zmm0, %zmm0
; X86-AVX512VL-NEXT: vpandq (%eax), %zmm0, %zmm0
Compared with the AVX2 version, this one materializes a 512 bit mask and do one vpandq instead of using 256bit mask with two vpandd.
If there are things to improve here, we should probably do it as a follow up.
https://github.com/llvm/llvm-project/pull/73565
More information about the llvm-commits
mailing list