[llvm] [LLVM][AArch64][tblgen]: Match clamp pattern (PR #75529)
Momchil Velikov via llvm-commits
llvm-commits at lists.llvm.org
Tue Dec 19 07:17:35 PST 2023
================
@@ -316,6 +316,32 @@ def AArch64ssra : PatFrags<(ops node:$op1, node:$op2, node:$op3),
[(int_aarch64_sve_ssra node:$op1, node:$op2, node:$op3),
(add node:$op1, (AArch64asr_p (SVEAnyPredicate), node:$op2, (SVEShiftSplatImmR (i32 node:$op3))))]>;
+// Replace pattern min(max(v1,v2),v3) by clamp
+def AArch64sclamp : PatFrags<(ops node:$Zd, node:$Zn, node:$Zm),
+ [(int_aarch64_sve_sclamp node:$Zd, node:$Zn, node:$Zm),
+ (AArch64smin_p (SVEAllActive),
+ (AArch64smax_p (SVEAllActive), node:$Zd, node:$Zn),
+ node:$Zm)
+ ]>;
+def AArch64uclamp : PatFrags<(ops node:$Zd, node:$Zn, node:$Zm),
+ [(int_aarch64_sve_uclamp node:$Zd, node:$Zn, node:$Zm),
+ (AArch64umin_p (SVEAllActive),
+ (AArch64umax_p (SVEAllActive), node:$Zd, node:$Zn),
+ node:$Zm)
+ ]>;
+def AArch64fclamp : PatFrags<(ops node:$Zd, node:$Zn, node:$Zm),
+ [(int_aarch64_sve_fclamp node:$Zd, node:$Zn, node:$Zm),
+ (AArch64fmin_p (SVEAllActive),
+ (AArch64fmax_p (SVEAllActive), node:$Zd, node:$Zn),
+ node:$Zm)
+ ]>;
+def AArch64bfclamp : PatFrags<(ops node:$Zd, node:$Zn, node:$Zm),
+ [(int_aarch64_sve_fclamp node:$Zd, node:$Zn, node:$Zm),
+ (int_aarch64_sve_fmin (nxv8i1 (SVEAllActive)),
----------------
momchil-velikov wrote:
But they handle it in a different way. Taking `BFMAX` (https://developer.arm.com/documentation/ddi0602/2023-09/SVE-Instructions/BFMAX--BFloat16-floating-point-maximum--predicated--) and `BFMAXNM` (https://developer.arm.com/documentation/ddi0602/2023-09/SVE-Instructions/BFMAXNM--BFloat16-floating-point-maximum-number--predicated--)
* BFMAX
> When FPCR.AH is 0, the behavior is as follows:
>
> Negative zero compares less than positive zero.
> When FPCR.DN is 0, if either element is a NaN, the result is a quiet NaN.
> When FPCR.DN is 1, if either element is a NaN, the result is Default NaN.
> When FPCR.AH is 1, the behavior is as follows:
>
> If both elements are zeros, regardless of the sign of either zero, the result is the second element.
> If either element is a NaN, regardless of the value of FPCR.DN, the result is the second element.
>
* BFMAXNM
> Regardless of the value of FPCR.AH, the behavior is as follows:
>
> Negative zero compares less than positive zero.
> If one element is numeric and the other is a quiet NaN, the result is the numeric value.
> When FPCR.DN is 0, if either element is a signaling NaN or if both elements are NaNs, the result is a quiet NaN.
> When FPCR.DN is 1, if either element is a signaling NaN or if both elements are NaNs, the result is Default NaN.
Similarly for `BFMIN` and `BFMINNM`
The `BFCLAMP` (https://developer.arm.com/documentation/ddi0602/2023-09/SVE-Instructions/BFCLAMP--BFloat16-floating-point-clamp-to-minimum-maximum-number-) is like `BFMINNM` and `BFMAXNM`:
> Regardless of the value of FPCR.AH, the behavior is as follows for each minimum number and maximum number operation:
>
> Negative zero compares less than positive zero.
> If one value is numeric and the other is a quiet NaN, the result is the numeric value.
> When FPCR.DN is 0, if either value is a signaling NaN or if both values are NaNs, the result is a quiet NaN.
> When FPCR.DN is 1, if either value is a signaling NaN or if both values are NaNs, the result is Default NaN.
>
And, of course, the pseudo-code is in terms of `BFMinNum`/`BFMaxNum`:
`Elem[result, e, 16] = BFMinNum(BFMaxNum(element1, element3, FPCR), element2, FPCR);`
https://github.com/llvm/llvm-project/pull/75529
More information about the llvm-commits
mailing list