[PATCH] D135930: [X86] Add AVX-NE-CONVERT instructions.

Sat Oct 29 07:02:05 PDT 2022

pengfei added inline comments.

================
Comment at: clang/include/clang/Driver/Options.td:4595-4596
 def mno_avxvnniint8 : Flag<["-"], "mno-avxvnniint8">, Group<m_x86_Features_Group>;
+def mavxneconvert : Flag<["-"], "mavxneconvert">, Group<m_x86_Features_Group>;
+def mno_avxneconvert : Flag<["-"], "mno-avxneconvert">, Group<m_x86_Features_Group>;
 def mavxvnni : Flag<["-"], "mavxvnni">, Group<m_x86_Features_Group>;
----------------
Need to move it before `mavxvnniint8 `.

================
Comment at: clang/lib/Basic/Targets/X86.cpp:1033
       .Case("avxvnniint8", HasAVXVNNIINT8)
+      .Case("avxneconvert", HasAVXNECONVERT)
       .Case("bmi", HasBMI)
----------------
Move it ahead.

================
Comment at: clang/lib/Headers/avx512vlbf16intrin.h:164
+#define _mm_cvtneps_pbh(A) \
+  ((__m128bh)__builtin_ia32_vcvtneps2bf16128((__v4sf)(A)))

----------------
pengfei wrote:
> RKSimon wrote:
> > Is there no way for __attribute__ to allow different attribute permutations?
> > 
> > Also, can we keep the __builtin_ia32_cvtneps2bf16_128 naming convention?
> > Is there no way for attribute to allow different attribute permutations?
> 
> We have discussed this problem with GCC folks. There are two problems here:
> 1. Unlike builtins, function attributes are more generic. It may introduce a lot of checks between callers and callees. I had a research to limit it to `__always_inline__` functions only. However, Clang handles inlining in middle-end, we don't have such information in the front-end. Besides, we don't know how to merge different permutations if they are inlining to the same function.
> 2. We don't know how to put the permutations into IR's function attributes. We need to preserve all permutations for inlining reference, but the backend needs a determine feature list rather than selective.
It's better to use `__builtin_ia32_cvtneps2bf16_128`.

================
Comment at: clang/lib/Headers/avxneconvertintrin.h:106
+///
+/// This intrinsic corresponds to the \c VBCSTNEBF162PS instruction.
+///
----------------
VBCSTNESH2PS

================
Comment at: clang/lib/Headers/avxneconvertintrin.h:139
+///
+/// This intrinsic corresponds to the \c VBCSTNEBF162PS instruction.
+///
----------------
VBCSTNESH2PS

================
Comment at: clang/lib/Headers/avxneconvertintrin.h:207
+/// \param __A
+///    A pointer to a 256-bit memory location containing 8 consecutive
+///    BF16 (16-bit) floating-point values.
----------------
16

================
Comment at: clang/lib/Headers/avxneconvertintrin.h:273
+/// \param __A
+///    A pointer to a 256-bit memory location containing 8 consecutive
+///    half-precision (16-bit) floating-point values.
----------------
16

================
Comment at: clang/lib/Headers/avxneconvertintrin.h:339
+/// \param __A
+///    A pointer to a 256-bit memory location containing 8 consecutive
+///    BF16 (16-bit) floating-point values.
----------------
16

================
Comment at: clang/lib/Headers/avxneconvertintrin.h:405
+/// \param __A
+///    A pointer to a 256-bit memory location containing 8 consecutive
+///    half-precision (16-bit) floating-point values.
----------------
16

================
Comment at: clang/test/Preprocessor/x86_target_features.c:593-599
+// RUN: %clang -target i686-unknown-linux-gnu -march=atom -mavxneconvert -x c -E -dM -o - %s | FileCheck  -check-prefix=AVXNECONVERT %s
+// AVXNECONVERT: #define __AVXNECONVERT__ 1
+
+// RUN: %clang -target i686-unknown-linux-gnu -march=atom -mno-avxneconvert -x c -E -dM -o - %s | FileCheck  -check-prefix=NO-AVXNECONVERT %s
+// RUN: %clang -target i686-unknown-linux-gnu -march=atom -mavxneconvert -mno-avx2 -x c -E -dM -o - %s | FileCheck  -check-prefix=NO-AVXNECONVERT %s
+
+// NO-AVXNECONVERT-NOT: #define __AVXNECONVERT__ 1
----------------
Should we check `__AVX2__` like we did for AVXVNNI?

================
Comment at: llvm/lib/Support/Host.cpp:1818

+  Features["avxneconvert"] = HasLeaf7Subleaf1 && ((EDX >> 5) & 1) && HasAVXSave;
+
----------------
Move it ahead and remove the blank line.

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:2186-2203
   if (!Subtarget.useSoftFloat() && Subtarget.hasBF16()) {
     addRegisterClass(MVT::v8bf16, &X86::VR128XRegClass);
     addRegisterClass(MVT::v16bf16, &X86::VR256XRegClass);
     addRegisterClass(MVT::v32bf16, &X86::VR512RegClass);
     // We set the type action of bf16 to TypeSoftPromoteHalf, but we don't
     // provide the method to promote BUILD_VECTOR. Set the operation action
     // Custom to do the customization later.
----------------
How about merge it here?

================
Comment at: llvm/lib/Target/X86/X86InstrSSE.td:8260-8261
+let Predicates = [HasAVXNECONVERT] in {
+  defm VBCSTNEBF162PS : AVX_NE_CONVERT_BASE<0xb1, "vbcstnebf162ps", i16mem,
+       i16mem>, T8XS;
+  defm VBCSTNESH2PS : AVX_NE_CONVERT_BASE<0xb1, "vbcstnesh2ps", f16mem, f16mem>,
----------------
This can be f16 mem now.

================
Comment at: llvm/lib/Target/X86/X86InstrSSE.td:8264-8265
+       T8PD;
+  defm VCVTNEEBF162PS : AVX_NE_CONVERT_BASE<0xb0, "vcvtneebf162ps", i128mem,
+       i256mem>, T8XS;
+  defm VCVTNEEPH2PS : AVX_NE_CONVERT_BASE<0xb0, "vcvtneeph2ps", f128mem,
----------------
f128mem, f256mem

================
Comment at: llvm/lib/Target/X86/X86InstrSSE.td:8268-8269
+       f256mem>, T8PD;
+  defm VCVTNEOBF162PS : AVX_NE_CONVERT_BASE<0xb0, "vcvtneobf162ps", i128mem,
+       i256mem>, T8XD;
+  defm VCVTNEOPH2PS : AVX_NE_CONVERT_BASE<0xb0, "vcvtneoph2ps", f128mem,
----------------
ditto.

================
Comment at: llvm/test/CodeGen/X86/avx512bf16-vl-intrinsics.ll:129-140
+declare <8 x bfloat> @llvm.x86.vcvtneps2bf16256(<8 x float> %A)
+
+define <8 x bfloat> @test_mm256_cvtneps2bf16_256_variant(<8 x float> %A) local_unnamed_addr #2 {
+; CHECK-LABEL: test_mm256_cvtneps2bf16_256_variant:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    vcvtneps2bf16 %ymm0, %xmm0 # encoding: [0x62,0xf2,0x7e,0x28,0x72,0xc0]
+; CHECK-NEXT:    vzeroupper # encoding: [0xc5,0xf8,0x77]
----------------
You don't need to add them here, just another RUN in below file should be enough, e.g.,
```
; RUN: llc < %s -O0 -verify-machineinstrs -mtriple=x86_64-unknown-unknown --show-mc-encoding -mattr=+avx512bf16,+avx512vl | FileCheck %s --check-prefix=AVX512BF16
; RUN: llc < %s -O0 -verify-machineinstrs -mtriple=i686-unknown-unknown --show-mc-encoding -mattr=+avx512bf16,+avx512vl | FileCheck %s --check-prefix=AVX512BF16
```

================
Comment at: llvm/test/CodeGen/X86/avxneconvert-intrinsics.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -O0 -verify-machineinstrs -mtriple=x86_64-unknown-unknown --show-mc-encoding -mattr=+avxneconvert | FileCheck %s --check-prefixes=X64
+; RUN: llc < %s -O0 -verify-machineinstrs -mtriple=i686-unknown-unknown --show-mc-encoding -mattr=+avxneconvert | FileCheck %s --check-prefixes=X86
----------------
--check-prefixes=CHECK,X64

================
Comment at: llvm/test/CodeGen/X86/avxneconvert-intrinsics.ll:3
+; RUN: llc < %s -O0 -verify-machineinstrs -mtriple=x86_64-unknown-unknown --show-mc-encoding -mattr=+avxneconvert | FileCheck %s --check-prefixes=X64
+; RUN: llc < %s -O0 -verify-machineinstrs -mtriple=i686-unknown-unknown --show-mc-encoding -mattr=+avxneconvert | FileCheck %s --check-prefixes=X86
+
----------------
--check-prefixes=CHECK,X86

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135930/new/

https://reviews.llvm.org/D135930