[llvm] [LLVM][AArch64][tblgen]: Match clamp pattern (PR #75529)

Momchil Velikov via llvm-commits llvm-commits at lists.llvm.org
Tue Dec 19 07:17:35 PST 2023


================
@@ -316,6 +316,32 @@ def AArch64ssra : PatFrags<(ops node:$op1, node:$op2, node:$op3),
                            [(int_aarch64_sve_ssra node:$op1, node:$op2, node:$op3),
                             (add node:$op1, (AArch64asr_p (SVEAnyPredicate), node:$op2, (SVEShiftSplatImmR (i32 node:$op3))))]>;
 
+// Replace pattern min(max(v1,v2),v3) by clamp
+def AArch64sclamp : PatFrags<(ops node:$Zd, node:$Zn, node:$Zm),
+                              [(int_aarch64_sve_sclamp node:$Zd, node:$Zn, node:$Zm),
+                              (AArch64smin_p (SVEAllActive),
+                                  (AArch64smax_p (SVEAllActive), node:$Zd, node:$Zn),
+                                  node:$Zm)
+                               ]>;
+def AArch64uclamp : PatFrags<(ops node:$Zd, node:$Zn, node:$Zm),
+                              [(int_aarch64_sve_uclamp node:$Zd, node:$Zn, node:$Zm),
+                               (AArch64umin_p (SVEAllActive),
+                                  (AArch64umax_p (SVEAllActive), node:$Zd, node:$Zn),
+                                  node:$Zm)
+                              ]>;
+def AArch64fclamp : PatFrags<(ops node:$Zd, node:$Zn, node:$Zm),
+                              [(int_aarch64_sve_fclamp node:$Zd, node:$Zn, node:$Zm),
+                              (AArch64fmin_p (SVEAllActive),
+                                  (AArch64fmax_p (SVEAllActive), node:$Zd, node:$Zn),
+                                  node:$Zm)
+                               ]>;
+def AArch64bfclamp : PatFrags<(ops node:$Zd, node:$Zn, node:$Zm),
+                              [(int_aarch64_sve_fclamp node:$Zd, node:$Zn, node:$Zm),
+                              (int_aarch64_sve_fmin (nxv8i1 (SVEAllActive)),
----------------
momchil-velikov wrote:

But they handle it in a different way. Taking `BFMAX` (https://developer.arm.com/documentation/ddi0602/2023-09/SVE-Instructions/BFMAX--BFloat16-floating-point-maximum--predicated--)  and `BFMAXNM` (https://developer.arm.com/documentation/ddi0602/2023-09/SVE-Instructions/BFMAXNM--BFloat16-floating-point-maximum-number--predicated--)


* BFMAX

> When FPCR.AH is 0, the behavior is as follows:
> 
> Negative zero compares less than positive zero.
> When FPCR.DN is 0, if either element is a NaN, the result is a quiet NaN.
> When FPCR.DN is 1, if either element is a NaN, the result is Default NaN.
> When FPCR.AH is 1, the behavior is as follows:
> 
> If both elements are zeros, regardless of the sign of either zero, the result is the second element.
> If either element is a NaN, regardless of the value of FPCR.DN, the result is the second element.
> 

* BFMAXNM

> Regardless of the value of FPCR.AH, the behavior is as follows:
> 
> Negative zero compares less than positive zero.
> If one element is numeric and the other is a quiet NaN, the result is the numeric value.
> When FPCR.DN is 0, if either element is a signaling NaN or if both elements are NaNs, the result is a quiet NaN.
> When FPCR.DN is 1, if either element is a signaling NaN or if both elements are NaNs, the result is Default NaN.

Similarly for `BFMIN` and  `BFMINNM`

The `BFCLAMP` (https://developer.arm.com/documentation/ddi0602/2023-09/SVE-Instructions/BFCLAMP--BFloat16-floating-point-clamp-to-minimum-maximum-number-) is like `BFMINNM` and `BFMAXNM`:

> Regardless of the value of FPCR.AH, the behavior is as follows for each minimum number and maximum number operation:
> 
> Negative zero compares less than positive zero.
> If one value is numeric and the other is a quiet NaN, the result is the numeric value.
> When FPCR.DN is 0, if either value is a signaling NaN or if both values are NaNs, the result is a quiet NaN.
> When FPCR.DN is 1, if either value is a signaling NaN or if both values are NaNs, the result is Default NaN.
> 

And, of course, the pseudo-code is in terms of `BFMinNum`/`BFMaxNum`:

`Elem[result, e, 16] = BFMinNum(BFMaxNum(element1, element3, FPCR), element2, FPCR);`


https://github.com/llvm/llvm-project/pull/75529


More information about the llvm-commits mailing list