[PATCH] D135930: [X86] Add AVX-NE-CONVERT instructions.
Phoebe Wang via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Sat Oct 29 07:02:05 PDT 2022
pengfei added inline comments.
================
Comment at: clang/include/clang/Driver/Options.td:4595-4596
def mno_avxvnniint8 : Flag<["-"], "mno-avxvnniint8">, Group<m_x86_Features_Group>;
+def mavxneconvert : Flag<["-"], "mavxneconvert">, Group<m_x86_Features_Group>;
+def mno_avxneconvert : Flag<["-"], "mno-avxneconvert">, Group<m_x86_Features_Group>;
def mavxvnni : Flag<["-"], "mavxvnni">, Group<m_x86_Features_Group>;
----------------
Need to move it before `mavxvnniint8 `.
================
Comment at: clang/lib/Basic/Targets/X86.cpp:1033
.Case("avxvnniint8", HasAVXVNNIINT8)
+ .Case("avxneconvert", HasAVXNECONVERT)
.Case("bmi", HasBMI)
----------------
Move it ahead.
================
Comment at: clang/lib/Headers/avx512vlbf16intrin.h:164
+#define _mm_cvtneps_pbh(A) \
+ ((__m128bh)__builtin_ia32_vcvtneps2bf16128((__v4sf)(A)))
----------------
pengfei wrote:
> RKSimon wrote:
> > Is there no way for __attribute__ to allow different attribute permutations?
> >
> > Also, can we keep the __builtin_ia32_cvtneps2bf16_128 naming convention?
> > Is there no way for attribute to allow different attribute permutations?
>
> We have discussed this problem with GCC folks. There are two problems here:
> 1. Unlike builtins, function attributes are more generic. It may introduce a lot of checks between callers and callees. I had a research to limit it to `__always_inline__` functions only. However, Clang handles inlining in middle-end, we don't have such information in the front-end. Besides, we don't know how to merge different permutations if they are inlining to the same function.
> 2. We don't know how to put the permutations into IR's function attributes. We need to preserve all permutations for inlining reference, but the backend needs a determine feature list rather than selective.
It's better to use `__builtin_ia32_cvtneps2bf16_128`.
================
Comment at: clang/lib/Headers/avxneconvertintrin.h:106
+///
+/// This intrinsic corresponds to the \c VBCSTNEBF162PS instruction.
+///
----------------
VBCSTNESH2PS
================
Comment at: clang/lib/Headers/avxneconvertintrin.h:139
+///
+/// This intrinsic corresponds to the \c VBCSTNEBF162PS instruction.
+///
----------------
VBCSTNESH2PS
================
Comment at: clang/lib/Headers/avxneconvertintrin.h:207
+/// \param __A
+/// A pointer to a 256-bit memory location containing 8 consecutive
+/// BF16 (16-bit) floating-point values.
----------------
16
================
Comment at: clang/lib/Headers/avxneconvertintrin.h:273
+/// \param __A
+/// A pointer to a 256-bit memory location containing 8 consecutive
+/// half-precision (16-bit) floating-point values.
----------------
16
================
Comment at: clang/lib/Headers/avxneconvertintrin.h:339
+/// \param __A
+/// A pointer to a 256-bit memory location containing 8 consecutive
+/// BF16 (16-bit) floating-point values.
----------------
16
================
Comment at: clang/lib/Headers/avxneconvertintrin.h:405
+/// \param __A
+/// A pointer to a 256-bit memory location containing 8 consecutive
+/// half-precision (16-bit) floating-point values.
----------------
16
================
Comment at: clang/test/Preprocessor/x86_target_features.c:593-599
+// RUN: %clang -target i686-unknown-linux-gnu -march=atom -mavxneconvert -x c -E -dM -o - %s | FileCheck -check-prefix=AVXNECONVERT %s
+// AVXNECONVERT: #define __AVXNECONVERT__ 1
+
+// RUN: %clang -target i686-unknown-linux-gnu -march=atom -mno-avxneconvert -x c -E -dM -o - %s | FileCheck -check-prefix=NO-AVXNECONVERT %s
+// RUN: %clang -target i686-unknown-linux-gnu -march=atom -mavxneconvert -mno-avx2 -x c -E -dM -o - %s | FileCheck -check-prefix=NO-AVXNECONVERT %s
+
+// NO-AVXNECONVERT-NOT: #define __AVXNECONVERT__ 1
----------------
Should we check `__AVX2__` like we did for AVXVNNI?
================
Comment at: llvm/lib/Support/Host.cpp:1818
+ Features["avxneconvert"] = HasLeaf7Subleaf1 && ((EDX >> 5) & 1) && HasAVXSave;
+
----------------
Move it ahead and remove the blank line.
================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:2186-2203
if (!Subtarget.useSoftFloat() && Subtarget.hasBF16()) {
addRegisterClass(MVT::v8bf16, &X86::VR128XRegClass);
addRegisterClass(MVT::v16bf16, &X86::VR256XRegClass);
addRegisterClass(MVT::v32bf16, &X86::VR512RegClass);
// We set the type action of bf16 to TypeSoftPromoteHalf, but we don't
// provide the method to promote BUILD_VECTOR. Set the operation action
// Custom to do the customization later.
----------------
How about merge it here?
================
Comment at: llvm/lib/Target/X86/X86InstrSSE.td:8260-8261
+let Predicates = [HasAVXNECONVERT] in {
+ defm VBCSTNEBF162PS : AVX_NE_CONVERT_BASE<0xb1, "vbcstnebf162ps", i16mem,
+ i16mem>, T8XS;
+ defm VBCSTNESH2PS : AVX_NE_CONVERT_BASE<0xb1, "vbcstnesh2ps", f16mem, f16mem>,
----------------
This can be f16 mem now.
================
Comment at: llvm/lib/Target/X86/X86InstrSSE.td:8264-8265
+ T8PD;
+ defm VCVTNEEBF162PS : AVX_NE_CONVERT_BASE<0xb0, "vcvtneebf162ps", i128mem,
+ i256mem>, T8XS;
+ defm VCVTNEEPH2PS : AVX_NE_CONVERT_BASE<0xb0, "vcvtneeph2ps", f128mem,
----------------
f128mem, f256mem
================
Comment at: llvm/lib/Target/X86/X86InstrSSE.td:8268-8269
+ f256mem>, T8PD;
+ defm VCVTNEOBF162PS : AVX_NE_CONVERT_BASE<0xb0, "vcvtneobf162ps", i128mem,
+ i256mem>, T8XD;
+ defm VCVTNEOPH2PS : AVX_NE_CONVERT_BASE<0xb0, "vcvtneoph2ps", f128mem,
----------------
ditto.
================
Comment at: llvm/test/CodeGen/X86/avx512bf16-vl-intrinsics.ll:129-140
+declare <8 x bfloat> @llvm.x86.vcvtneps2bf16256(<8 x float> %A)
+
+define <8 x bfloat> @test_mm256_cvtneps2bf16_256_variant(<8 x float> %A) local_unnamed_addr #2 {
+; CHECK-LABEL: test_mm256_cvtneps2bf16_256_variant:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: vcvtneps2bf16 %ymm0, %xmm0 # encoding: [0x62,0xf2,0x7e,0x28,0x72,0xc0]
+; CHECK-NEXT: vzeroupper # encoding: [0xc5,0xf8,0x77]
----------------
You don't need to add them here, just another RUN in below file should be enough, e.g.,
```
; RUN: llc < %s -O0 -verify-machineinstrs -mtriple=x86_64-unknown-unknown --show-mc-encoding -mattr=+avx512bf16,+avx512vl | FileCheck %s --check-prefix=AVX512BF16
; RUN: llc < %s -O0 -verify-machineinstrs -mtriple=i686-unknown-unknown --show-mc-encoding -mattr=+avx512bf16,+avx512vl | FileCheck %s --check-prefix=AVX512BF16
```
================
Comment at: llvm/test/CodeGen/X86/avxneconvert-intrinsics.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -O0 -verify-machineinstrs -mtriple=x86_64-unknown-unknown --show-mc-encoding -mattr=+avxneconvert | FileCheck %s --check-prefixes=X64
+; RUN: llc < %s -O0 -verify-machineinstrs -mtriple=i686-unknown-unknown --show-mc-encoding -mattr=+avxneconvert | FileCheck %s --check-prefixes=X86
----------------
--check-prefixes=CHECK,X64
================
Comment at: llvm/test/CodeGen/X86/avxneconvert-intrinsics.ll:3
+; RUN: llc < %s -O0 -verify-machineinstrs -mtriple=x86_64-unknown-unknown --show-mc-encoding -mattr=+avxneconvert | FileCheck %s --check-prefixes=X64
+; RUN: llc < %s -O0 -verify-machineinstrs -mtriple=i686-unknown-unknown --show-mc-encoding -mattr=+avxneconvert | FileCheck %s --check-prefixes=X86
+
----------------
--check-prefixes=CHECK,X86
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D135930/new/
https://reviews.llvm.org/D135930
More information about the cfe-commits
mailing list