[PATCH] D12785: Document __builtin_nontemporal_load and __builtin_nontemporal_store.

Mikhail Zolotukhin via cfe-commits cfe-commits at lists.llvm.org
Thu Sep 10 18:48:24 PDT 2015


> On Sep 10, 2015, at 6:29 PM, Richard Smith <richard at metafoo.co.uk> wrote:
> 
> On Thu, Sep 10, 2015 at 5:57 PM, Michael Zolotukhin <mzolotukhin at apple.com <mailto:mzolotukhin at apple.com>> wrote:
> mzolotukhin added inline comments.
> 
> ================
> Comment at: docs/LanguageExtensions.rst:1802-1807
> @@ +1801,8 @@
> +
> +For example, on AArch64 in the following code::
> +
> +  LDR X1, [X2]
> +  LDNP X3, X4, [X1]
> +
> +the ``LDNP`` might be executed before the ``LDR``. In this case the load would
> +be performed from a wrong address (see 6.3.8 in `Programmer's Guide for ARMv8-A
> ----------------
> rsmith wrote:
> > This seems to make the feature essentially useless, since you cannot guarantee that the address register is set up sufficiently far before the non-temporal load. Should the compiler not be required to insert the necessary barrier itself in this case?
> Yes, we can require targets to only use corresponding NT instructions when it's safe, and then remove this remark from the documentation. For ARM64 that would mean either not to emit LDNP at all, or conservatively emit barriers before each LDNP (which probably removes all performance benefits of using it) - that is, yes, non-temporal loads would be useless on this target.
> 
> I think this should already be the case -- according to the definition of !nontemporal in the LangRef (http://llvm.org/docs/LangRef.html#load-instruction <http://llvm.org/docs/LangRef.html#load-instruction>), using an LDNP without an accompanying barrier would not be correct on AArch64, as it does not have the right semantics.
I removed the paragraph in updated patch.
>  
> But I think we want to keep the builtin for NT-load, as it's a generic feature, not ARM64 specific. It can be used on other targets - e.g. we can use this in x86 stream builtins, and hopefully simplify their current implementation. I don't know about non-temporal operations on other targets, but if there are others, they can use it too right out of the box.
> 
> Yes, I'm not arguing for removing the builtin, just that the AArch64 backend needs to be very careful when mapping it to LDNP, because that will frequently not be correct.
Yes, that's true, and that was the reason why we only implemented STNP for now.

Michael

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150910/4522b8e1/attachment.html>


More information about the cfe-commits mailing list