<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/169034>169034</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [AArch64][SVE] Non-temporal load/store instructions fail to be generated from intrinsics and builtins
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            backend:AArch64,
            missed-optimization,
            llvm:SelectionDAG
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          ytmukai
      </td>
    </tr>
</table>

<pre>
    Since LLVM 21, using ACLE intrinsics for SVE non-temporal loads/stores with an all-true predicate fails to generate the expected non-temporal instructions.

Code to reproduce:

```c
#include <arm_sve.h>

void f(double* a) {
    svbool_t allone = svptrue_b64();
    svstnt1(allone, a + 1,
            svldnt1(allone, a));
}
```

https://godbolt.org/z/za6Mb1Een


In LLVM 21, an all-true predicate is now represented as a constant `splat_vector` in the SelectionDAG. This enables an optimization in `DAGCombiner` that converts a `masked_load`/`masked_store` node into a regular `load`/`store` node. However, the instruction selection patterns for SVE non-temporal instructions are only defined for masked ones.

LLVM 20:

```
Initial selection DAG: %bb.0 'f:entry'
SelectionDAG has 13 nodes:
  t0: ch,glue = EntryToken
 t2: i64,ch = CopyFromReg t0, Register:i64 %0
  *** t5: nxv2i1 = llvm.aarch64.sve.ptrue TargetConstant:i64<1553>, TargetConstant:i32<31>
 t9: nxv2f64,ch = llvm.aarch64.sve.ldnt1<(non-temporal load (<vscale x 1 x s128>) from %ir.a, align 8, !tbaa !6)> t0, TargetConstant:i64<1481>, t5, t2
 t7: i64 = add nuw t2, Constant:i64<8>
    t11: ch = llvm.aarch64.sve.stnt1<(non-temporal store (<vscale x 1 x s128>) into %ir.add.ptr, align 8, !tbaa !6)> t9:1, TargetConstant:i64<1792>, t9, t5, t7
 t12: ch = AArch64ISD::RET_GLUE t11
```

LLVM 21:

```
Initial selection DAG: %bb.0 'f:entry'
SelectionDAG has 15 nodes:
  t0: ch,glue = EntryToken
  t2: i64,ch = CopyFromReg t0, Register:i64 %0
  t9: nxv2i1 = insert_vector_elt poison:nxv2i1, Constant:i1<-1>, Constant:i64<0>
  ***t10: nxv2i1 = splat_vector Constant:i1<-1>***
  t11: nxv2f64,ch = llvm.aarch64.sve.ldnt1<(non-temporal load (<vscale x 1 x s128>) from %ir.a, align 8, !tbaa !10)> t0, TargetConstant:i64<1598>, t10, t2
      t4: i64 = add nuw t2, Constant:i64<8>
    t13: ch = llvm.aarch64.sve.stnt1<(non-temporal store (<vscale x 1 x s128>) into %ir.add.ptr, align 8, !tbaa !10)> t11:1, TargetConstant:i64<1909>, t11, t10, t4
  t14: ch = AArch64ISD::RET_GLUE t13

Combining: t13: ch = llvm.aarch64.sve.stnt1<(non-temporal store (<vscale x 1 x s128>) into %ir.add.ptr, align 8, !tbaa !10)> t11:1, TargetConstant:i64<1909>, t11, t10, t4
 ... into: t17: ch = masked_store<(non-temporal store (<vscale x 1 x s128>) into %ir.add.ptr, align 8, !tbaa !10)> t11:1, t15, t4, undef:i64, t10

Combining: t17: ch = masked_store<(non-temporal store (<vscale x 1 x s128>) into %ir.add.ptr, align 8, !tbaa !10)> t11:1, t15, t4, undef:i64, t10
 ... into: t18: ch = store<(non-temporal store (<vscale x 1 x s128>) into %ir.add.ptr, align 8, !tbaa !10)> t11:1, t15, t4, undef:i64
```


The transformation from `masked_load/store` to `load/store` is performed here:
https://github.com/llvm/llvm-project/blob/622f72f4bef8b177e1e4f318465260fbdb7711ef/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L12782

The existing patterns are defined here:
https://github.com/llvm/llvm-project/blob/622f72f4bef8b177e1e4f318465260fbdb7711ef/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td#L3052


A similar issue has existed with `__builtin_nontemporal_load/store`. These builtins also fail to generate non-temporal instructions. This appears to be the same root cause.

https://godbolt.org/z/rhzYaxjj5

To resolve both of these issues, should we add isel patterns for non-temporal instructions that match regular `load` and `store` nodes?

</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJzcWF1v47oR_TX0yyCCSFqS_eAHrT_SBXL7sEkX6JNBSSOLuzJpkJST7K8vSEmJHCe3e1sUd3sNJ3Hk4XDOzDnUjIS18qAQVyT5RJLNTHSu0Wb17I7ddyFnha6eV_dSlQh3d19_A0YJW0NnpTpAvr7bglTOSGVlaaHWBu6_bkFpdePweNJGtNBqUVnCdtZpgxYepWtAKBBte-NMh3AyWMlSOIRayNaC03BAhcZfcQ0CPp2wdFhdepXKOtOVTmplIxLnJM7XukK_2uDJ6KorkfC8_4akcf8u_T-MS1W2XYVA-FqY496eMWoI3_bGZy0rqAlbVLorWiQsB0HYEkj2icQ5AIA9F1q3e-cxaOXdbMCeTx7NvkjnhC0IWxL-am6dcpSwRW_v8yeAsE_gUzkYjS97bqsrY-9u9EiyzRRRH3Pj3Ml6uGxH2O6gq0K3LtLmQNjuh_8R6W8F3aIa8hHnn9W0nO_XQ1pQ-jHkEy0qXwNhQUCplXVCOSBpbE-tcPszlk4bksYgVSjaPbYYirPJbyN4aKQFVKJo0fq99MnJo_whvIFfQdJ4k9-u9bGQCoMb1wjn9zmjcX5LksZHYb9jtfd08rjZ7vVaoJZfpjwFpHIaBBg8dK0wfunFmgvjCP6mH_GMxmfBBz7hFdgRBJyEc2jUBwyfchGEQdCqfYYKa6mwCkv6OEErHMnaJz--pmiojXRStJP9N_kt4TkQlhRFFANhWU14jsqZZ8IyEufTfEMjLFAe8Nl-AwDnt4KyIWx9aLues1u__kF_D7QAx7yF9Pxdl00wWOvT887o4xc8eAdsDV_wIK1DQ3gu07kPKA7uCcv7N7jEu1FPZyZpcNK252MkhCmbdB55qQWhwIMwB3TrgUq9P8LXNEm4lyJbv2PBGeFrTnupgluOO9XToK_26wXF14Qtrg4m8GLl67MtRYvwBBSewFJ_bes1Xxt99CCliUTQSSsPChb-I2HUFcLrmKZBnNshRR8Bmy_oAMwl4TcLILIh6yF2UVWgukf_HVvDlY_FgBwAHKV9Qd_H3J8415gD-f8N6KCfAXRV-XL9BHRfC_p78LMlG-EvJ0nIQhIom4DJ84Dj8_3Gk5fnX7YP-9u7f2wD5qujbzzF_mdCSv6okP47Jb2yetCPVBbNeMTusXVw0tJqRXjeW72liq_7zci1q0LEI4deJOto_GbL6aH-kethcR9zT8ZfQIo0_iktJsvFSEYav4oxvNz8P1Uk_xUU-ZqBUJTf1eQyXr6kgU6zMR_KOv85WfKxAfM3cKkOftVfLh1RFIWde3DZBNxFF_LnQHI0GSL1zbmqsB5gDSjeq8__HYQ3JVhM4v8lA7-6V5E4f2gQnBHK1toc-wa4P9kuG9xhXAqtsH5pYSdXpYUTGu8EK2jQDMPOm0FAuqYrolIfCdt5CQ5_bk5Gf8PSEbYrWl0QtksZqzNWzwusFwXNMqQ4rzldzNOEpXFdVEWWUYr1xI_06_zMdYuKsN30xknYbtLPR-XpRBi_oyxbsNcs4JO0zk-RL821b53HrvnPg9SfDoTthgPv9dP91-1n3-p_VrWOXOUx8ThhL7XNwcqj9DOHtLbD0D4ElFj1Yy9J4_2-6GTrpNorrUaavq25n5jQIgymFkRrdRiPL6bjjwfifuISpxMKEybqop-lrTgiGK0dlKKzGP3c9GiaH_8UT9--JUPx_IhtdXtGKLRrQNfet8UetfUqsI3u2goeMdw_pcX2coT6eHwKg99RuLJ5Z4ADoSp4O8JZwnckzmfVildLvhQzXNEs5RmnnMezZsXiIsuSpMSqzDI6Fxld4nyRsiQpkppnOJMrFrOEUkYpZ4zGUZkVfFEt0vmSx2mRMjKP8ShkG4W7mDaHWUC6ouky5vNZKwpsbXh8wlghyu-oKsLzF_r4zoEwdpTWYnUznX1fvgsE5PmlhhhJNjOzCuwuuoMl87iV1tnXMJx0bXhuM-6VbEjy6f7rliQb-PvbBmsk2GXCR1YV-EKsqj-SJg92fN5HMs46067-sChHbuyGpJ1X7F8BAAD__-T1lDc">