<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/146484>146484</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Efficient memcpy when src or dest can be nullptr
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
QrczakMK
</td>
</tr>
</table>
<pre>
How to obtain efficient `memcpy()` when src or dest can be `nullptr` (when the size is 0)?
`memcpy(_, nullptr, 0)` and `memcpy(nullptr, _, 0)` have UB according to C and C++ standards, except that https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3322.pdf has been accepted for the next C standard; I could find nothing for C++.
GCC makes use of that UB, e.g. by skipping explicit `nullptr` checks after `memcpy()`. Clang does not. Sanitizers are consistent with that.
In GCC, `__builtin_memcpy()` behaves like `memcpy()` in this respect, i.e. does not allow `nullptr`. This suggests that portable code (at least including GCC) one should not blindly issue `memcpy()` nor `__builtin_memcpy()` if null pointers are possible.
According to https://github.com/llvm/llvm-project/issues/49459, `llvm.memcpy` is safe to call with some null pointers and zero size, but I could not find a documentation that `__builtin_memcpy()` has this property too. The issue concludes with 'I think we can safely close this, with the answer being "yes, you can pass null pointer arguments"', but I feel uneasy relying on that comment without a more official documentation.
Clang nor GCC do not optimize portable `if (len != 0) memcpy(dest, src, len)` to a plain call to `memcpy()`, even though Clang could if it indeed assumes that actual `memcpy()` in the runtime library is valid with some null pointers and zero size.
Explicit zero checks slow down my implementation of joining a bunch of string_views by 14%.
What should I do, so that this is efficient at least on Clang?
In practice using `__builtin_memcpy()` or even `memcpy()` inside `#ifdef __clang__` should work, but I would like to have a more official guarantee that this must work. And perhaps this requires some version check if the behavior changed in Clang in the past.
The same applies to `memset()`.
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJycVk-P4j4S_TTmUpooGALNgUM3M8y2VntY7Y72iBy7knjasbMupxn606_KCfQf9YxWPwkJiZQrr9579Ywisq1H3IvqQVRfF2pMXYj7f0b9op7-8fdFHcxl_7dwhhQg1ElZD9g0Vlv0CcSm7LHXw0XIOyF3YlPCuUMPFDWECAYpgVYeauRSPzo3pMhVQt7lwtQhkH1BsAQld1gdRXnPnzedT0Ie4HpYHqbCTQnKm3cI3pSc3tZ16hnhxwMorUM01rc8yyEfPwj5IOQDUFLeqGiIz-EvjUOC1KkEXUoDidW9kEchj-fzuQgD-i-UTBFiK-TxZ9JLIY-kpeSCdrme6oQ8mqBJyKNfraQsBtNApwhqRM9IcEhooAkxc-DxV4LDDYZYPcAj6DA6A431BnxIHQPn-hlzMRH1_XCAXj0hwUgIoZlg_3jIgxRtAfUF6MkOAx_HX4Oz2qYPaugO9ROBahLGTzQt4OCUb8EEJEZSwL-Ut8m-YCRQEUEHT5YSO-JsU5chzPAePXw_HBiM2JSnUz1al6w_fXRNjSwSgbNP-JmrLFvFEkSkAXXifrbA4gYJlHPh_H6sAv7NR2hsW6REEzFDiEnVjjEbZBuqBA4VJbBeuzG7IwPeQfAI1GUN-A21s964C1ii8VOMPsQ_D2mb7GIYgvXpyt0QiGztcObr_q1H35uvtakb60KHXsijc8_Xry9DDD8zKceMjT233q2r3cw61xQzFkZBQKpB7q-Vc5NiFHr8CM4beMEY8n5yq3pMN1MyIdmYCkzQY48-qWSDn0j-Iwu8BFnLIYYBY7pACoG1wplaHbISSBM0IbePfMA_wRlzmjB8dwHtAmFuxehm4yEoT2eMUCOTKKS8YH5-CWM-PCiid5OCim0egASv8PZ11AbRwehR0QUiugs3vI6oQ99f_R7GBAr6EHn_OBqVe8_KrO20RWwTXloTMothSLbnBLw5U2xK27A1HXoQcilWX3OUwY1JzlWGSVHzl0M_c5sCKBgch3TWNoVPjJqT4TmHbxjbbl7uSVfbgOVVMIgGFNHY47w4SqdRud_uJkIcfbI9grN1VJH3BJ6Vs-b_M9hM0bdrQOUHcy4Rr7YJZw_9BWw_OHz1W2jgZ7CepVFQj153_BOlaH17erZ4Jg5ADuVqfsV_eJp5rx_BhExkmIbMvrT05oa7xUPwE1G3G-rRwxCVTlYjjJTN9ifbhziR_imBZE3WXciVbQw2cDppftnpxM9nsOcQn17Nec6_5bzkpOAr7qMH21FF5RPim-H6kVLuVMC9NzBg7NRA13D972gj0iTWM0ZihrMIbAwWOee0DRF0p3yLhsWf_DO7YFB0jX7eaFI9ghoGZ9lHVzcSptfLZWH2K7Nb7dQC98tttVxV2-2yWnT7VXlXbitZVc2y3lVyLU21abYrWeNS15VcL-xelrIqt-WyvFtWy6pYb7RUtV7emfpO73An1iX2yroiR2CI7SInzH653qzv1gunanSU__hI6fE85Q-nQPV1Efc5W-uxJbEunaVEr22STQ73324umRT97X-f-U5ajNHt_3qmz6if9_J_AQAA__9_Fzit">