<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/128006>128006</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[ARM][AArch64] Vector intrinsics do not match hardware behavior for NaN, subnormals
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
ostannard
</td>
</tr>
</table>
<pre>
The ARM/AArch64 vector intrinsics are defined as having the exact same behaviour as the hardware instructions.
For MVE:
> The behavior of an intrinsic is specified to be equivalent to the MVE instruction it is mapped to in [[MVE]](https://arm-software.github.io/acle/main/acle.html#MVE). Intrinsics are specified as a mapping between their name, arguments and return values and the MVE instruction and assembler operands which they are equivalent to.
>
> A compiler may make use of the as-if rule from C [[C99]](https://arm-software.github.io/acle/main/acle.html#C99) (5.1.2.3) to perform optimizations which preserve the instruction semantics.
For AdvSIMD:
> The behavior of an intrinsic is specified to be equivalent to the AArch64 instruction it is mapped to in [[Neon]](https://arm-software.github.io/acle/main/acle.html#Neon). Intrinsics are specified as a mapping between their name, arguments and return values and the AArch64 instruction and assembler operands which they are equivalent to.
>
> A compiler may make use of the as-if rule from C [[C99]](https://arm-software.github.io/acle/main/acle.html#C99) (5.1.2.3) to perform optimizations which preserve the instruction semantics.
However, clang does constant folding which doesn't always match the hardware's exact behaviour in cases like NaNs or subnormals.
For example, the MVE instructions always use a "default NaN" of 0x7ffc0000 (for single-precision) when the result of the instruction is any NaN, but we constant-fold this code down to return the input NaN value of 0xffffff42:
```c
#include <arm_mve.h>
uint32x4_t foo() {
float32x4_t nan = vreinterpretq_f32_u32(vdupq_n_u32(0xffffff42));
float32x4_t nan_plus_nan = vaddq_f32(nan, nan);
return vreinterpretq_u32_f32(nan_plus_nan);
}
```
```
$ /work/llvm/build/bin/clang --target=arm-none-eabi -march=armv8.1-m.main+mve.fp -S nan.c -o - -O1 -mfloat-abi=hard
...
foo:
.fnstart
@ %bb.0: @ %entry
vmvn.i32 q0, #0xbd
bx lr
...
```
For subnormals, MVE instructions always flush input subnormal values to zero, but we optimise this code as if that was not the case, so the result gets rounded up to 1.0f:
```c
#include <arm_mve.h>
float32x4_t bar() {
float32x4_t smallest_subnormal = vreinterpretq_f32_u32(vdupq_n_u32(1));
float32x4_t round_up = vrndpq_f32(smallest_subnormal);
return round_up;
}
```
```
$ /work/llvm/build/bin/clang --target=arm-none-eabi -march=armv8.1-m.main+mve.fp -S subnormal.c -o - -O1 -mfloat-abi=hard
...
bar:
.fnstart
@ %bb.0: @ %entry
mov.w r0, #1065353216
vdup.32 q0, r0
bx lr
...
```
For AArch64 AdvSIMD, the rounding mode and subnormal flushing behaviour are configurable with the `FPCR` register, but we also emit code which constant-folds these operations.
I think it would be reasonable to deviate from the ACLE here, and allow these optimisations depending on the floating-point options (e.g. `-ffp-model=`), but none of these options seem to have any effect on vector intrinsics.
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJzsV12P27oR_TX0y0CCRNle-8EP3vUaDdCkxb1FXg1KHFlsKFIhKXn3_vpiKPnr7qZZFEHRhwoGDIqa4fDMOcOh8F4dDeKGLR7ZYjcTfWis21gfhDHCyVlp5evmHw3C9rfPjO-3W1c1yzkMWAXrQJnglPGq8iAcgsRaGZQgPDRiUOYIoUHAF1EF8KJFKJHe297RJzTXCCdPZKqMD66vgrLGpyzbsmy7tw4-f31mBQ1Y8QwUxuTBga1BmGsAoDz4DitVK5QQLJQI-L1Xg9BoAr2g5T5_fb5dCVQgu1Z03WikDEQgHmndxY5-fNWE0HmKgu8Z3wvXJt7WgaJOjyo0fZkqSxOVRsb3rVBmGqVNaDXjBTnj6xQ-3aN1DVd4EDEKgqzEcEI0FK9yYESLjD-BcMe-RRM8CCPBYeidgUHoHsc37-2O3gvvsS01OrAdOmGkh1OjqoYMXmMYdyilE9TT3xYq23aKzFvxCq34htB7JPBpQeETVYPrNULtbAtPE3pP6_WvQ4-c8TUwvlqkecrTgkbBQoeutq4F2wXVqj9EpM60uc6hRzdgjPIWEo-tMEFVtxzbyuH3T593v5JnZ5l8iGtf0JpfB1f09t9g23t7_D_jfsq4v9gTDugI5UoLcwRp0UNlDZXcALXVkvIyeqU5w_hDAKFP4pXoE0YkL5WT8Qc_ldhrdVUGKuHRg1bfEL6ILx6sA9-XxrpW6Fv244toOx2z_k4N8eeFKQcCGOcSa9HrQE4Z55SX7OWhrqssyzKCrKaFlDlqTDqHlfIq8hFOzcgycOjJfkronUKIXK-j5yco-wAnvCCTEDIQGkVgSQRpT4aSMpFz9NX1MbCRqmNsdXzmfJL3Mht_FQ14oUyle4nAiifh2kM7YNqw4nmEp1cmFPxlfqC0WMZXkRUPjyzbAtTaivOsEQZYsYPBoTIBXecwfD_UBT_0BWd8Nci--34w0-gmJk5EY8W7Hg-d7v3h4lpIGV0yvjLCEEDx72J9FuldCH3BrzYXhxcr9rC7xWTc9d2Qz4Hx_cm6b4zvtR5axvdlr7Sk_yickcRJEoQ7YmDFjgRnrMEERakgaYWrmvH1sErzpE1HxT0S1nUHye-0kbSCxEICyd9ySNqIRCJKxYod8Zxl2zQlzlIaYh4hrYkWLlCQcyLeoizTjBVb-MkzfY0muFdyNLSDSVXBz_PfM8KW8SJ7KWVEdnzKl_Ffu0s0fwJufycx8vIjMdW6981E14vFub4GC3-gszcSGOuNxxv2Cw-KBCQCnIQHY0NUAImeDL29ldoRgwdneyNRQt_RCnma1f-BIm4pWgr3byThW6E1-nC47u_jCsl_LIy4jUPfTd6M7M6qeLviW3Wcrf-n6H-J94MiIOBHEUzPj7XwZ7JPT2uH9AQA7sz1PFsuikXB8-XNV5STtOCTIhwB8kEVnFuDc2c1nS0RfTrc2khhI2_IHyUxNiSXK4KL1b9Wx96JUiOcVBjPPrbM9n9_-o0tM3B4VD6M5-kkF6G9BWxVGKUynqR3x0i8e1BTQQ3K7ZXjE0nMfKNe7WR7Lam7cyi8NTGAYEHioESYuo_YBj399RkadGPfRP2P1vZ0WSAqd-oUJHY4AmDHAyumWJlj0lllQvyavmN8hekxpW0mdd0lBJdmxY4wJmGMOyWSTafotBLZesSW4mzEgPE0xbrGKtCKb-5s6UxuCrku1mKGm_xhnmWLdZ7PZ81GynpdVnIxX4msnmMuUOJazMu6rEqOYjFTG57xRcZ5lhfzRT5Pi5xjvahFmcu1XPI5m2fYCqVTUk1q3XGmvO9xk_NVli1nWpSofbx1cm7wBHGWcU6XULcho6Tsj57NM6188Fc3QQUdr6t0IaWW73FiG1vs4Ouba6m0sTiOfdPltnnp7qlbmfqNa-me9U5v7rvIqXGsbHstBDHIztl_YhUY38cdeMb30xaHDf9XAAAA___PePvY">