[PATCH] D12785: Document __builtin_nontemporal_load and __builtin_nontemporal_store.
Michael Zolotukhin via cfe-commits
cfe-commits at lists.llvm.org
Thu Sep 10 17:57:12 PDT 2015
mzolotukhin added inline comments.
================
Comment at: docs/LanguageExtensions.rst:1802-1807
@@ +1801,8 @@
+
+For example, on AArch64 in the following code::
+
+ LDR X1, [X2]
+ LDNP X3, X4, [X1]
+
+the ``LDNP`` might be executed before the ``LDR``. In this case the load would
+be performed from a wrong address (see 6.3.8 in `Programmer's Guide for ARMv8-A
----------------
rsmith wrote:
> This seems to make the feature essentially useless, since you cannot guarantee that the address register is set up sufficiently far before the non-temporal load. Should the compiler not be required to insert the necessary barrier itself in this case?
Yes, we can require targets to only use corresponding NT instructions when it's safe, and then remove this remark from the documentation. For ARM64 that would mean either not to emit LDNP at all, or conservatively emit barriers before each LDNP (which probably removes all performance benefits of using it) - that is, yes, non-temporal loads would be useless on this target.
But I think we want to keep the builtin for NT-load, as it's a generic feature, not ARM64 specific. It can be used on other targets - e.g. we can use this in x86 stream builtins, and hopefully simplify their current implementation. I don't know about non-temporal operations on other targets, but if there are others, they can use it too right out of the box.
http://reviews.llvm.org/D12785
More information about the cfe-commits
mailing list