[PATCH] D105264: [X86] AVX512FP16 instructions enabling 2/6
Pengfei Wang via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Wed Aug 11 07:01:46 PDT 2021
pengfei added inline comments.
================
Comment at: clang/include/clang/Basic/BuiltinsX86.def:1860
+TARGET_BUILTIN(__builtin_ia32_minph512, "V32xV32xV32xIi", "ncV:512:", "avx512fp16")
+
+TARGET_BUILTIN(__builtin_ia32_minph256, "V16xV16xV16x", "ncV:256:", "avx512fp16,avx512vl")
----------------
LuoYuanke wrote:
> Why there is no 256 and 128 version for addph, subph, mulph, divph?
Because they don't support rounding control, thus we can directly use instructions like fadd, fsub etc.
================
Comment at: clang/lib/Headers/avx512fp16intrin.h:312
+ __m128h B) {
+ return __builtin_ia32_vcomish((__v8hf)A, (__v8hf)B, _CMP_NEQ_US,
+ _MM_FROUND_CUR_DIRECTION);
----------------
LuoYuanke wrote:
> _CMP_NEQ_OS?
`_CMP_NEQ_US` is correct, see the operation in intrinsic guide (we should update for sh):
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_comineq_ss&expand=1365,1366
================
Comment at: clang/lib/Headers/avx512fp16intrin.h:318
+ __m128h B) {
+ return __builtin_ia32_vcomish((__v8hf)A, (__v8hf)B, _CMP_EQ_OQ,
+ _MM_FROUND_CUR_DIRECTION);
----------------
LuoYuanke wrote:
> Why it is OQ not UQ? Ditto for all other ucomi intrinsics.
OQ requires both inputs are non-nan. This is the same as the intrinsic's behavior:
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_ucomieq_ss&expand=1354,7232
The "u" in intrinsic name corresponds to not "U" but "Q" in the macro name.
================
Comment at: clang/lib/Headers/avx512fp16intrin.h:516
+ return (__m512h)__builtin_ia32_maxph512((__v32hf)__A, (__v32hf)__B,
+ _MM_FROUND_CUR_DIRECTION);
+}
----------------
LuoYuanke wrote:
> Why there is rounding control for max/min operation?
They still need to control the exception.
================
Comment at: clang/lib/Headers/avx512fp16intrin.h:669
+ __A = _mm_div_sh(__A, __B);
+ return __builtin_ia32_selectsh_128(__U, __A, __W);
+}
----------------
LuoYuanke wrote:
> Will it be combined to one instruction? If __B[0] is 0, and mask[0] is 0, there is no exception?
We should consider this case, but only under strict FP mode. We have ss/sd defined in this way too. We should fix them in future.
================
Comment at: clang/lib/Headers/avx512fp16intrin.h:698
+ (__v8hf)__A, (__v8hf)__B, (__v8hf)_mm_setzero_ph(), (__mmask8)-1,
+ _MM_FROUND_CUR_DIRECTION);
+}
----------------
LuoYuanke wrote:
> Do we have rounding control for min?
No. But we need control exception.
================
Comment at: clang/lib/Headers/avx512fp16intrin.h:757
+
+#define _mm_max_round_sh(A, B, R) \
+ (__m128h) __builtin_ia32_maxsh_round_mask( \
----------------
LuoYuanke wrote:
> This name may be misleading, it means suppress exception. Right?
Yes. But we are widely using it for SAE in existing code :)
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D105264/new/
https://reviews.llvm.org/D105264
More information about the cfe-commits
mailing list