<div dir="ltr">Agreed with the other replies on this thread, I'd also suggest looking at my RFC:<blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div><a href="https://groups.google.com/d/topic/llvm-dev/ZJ8SVCJPpcc/discussion">https://groups.google.com/d/topic/llvm-dev/ZJ8SVCJPpcc/discussion</a></div></blockquote><div>Which I still have to implement.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, May 3, 2016 at 3:40 AM, Hahnfeld, Jonas via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello all,<br>

<br>

I've been wondering why Clang doesn't generate non-temporal stores when<br>

compiling the STREAM benchmark [1] and therefore doesn't yield optimal<br>

results.<br>

<br>

It turned out that the Loop Vectorizer correctly vectorizes the arithmetic<br>

operations and also merges the loads and stores into vector operations.<br>

However it doesn't add the '!nontemporal' metadata which would be needed for<br>

maximal bandwidth on X86.<br>

I briefly looked into this and for non-temporal memory instructions to work,<br>

the memory address would have to be aligned to the vector length which<br>

currently isn't the case neither.<br>

<br>

To summarize the following things would be needed to give non-temporal<br>

hints:<br>

1) Ensure correct alignment of merged vector memory instructions<br>

This could be implemented by executing the first (scalar) loop iterations<br>

until the addresses for loads and stores are aligned, similar to what already<br>

happens for the remainder of the loop. The larger alignment would also allow<br>

aligned vector instructions instead of the currently unaligned ones.<br>

<br>

2) Give non-temporal hints when different array elements are only used once<br>

per loop iteration<br>

We probably need to analyze the different load and stores per loop iteration<br>

for this...<br>

<br>

Any thoughts or any ongoing work that I'm missing?<br>

<br>

Thanks,<br>

Jonas<br>

<br>

<br>

[1] <a href="https://www.cs.virginia.edu/stream/" rel="noreferrer" target="_blank">https://www.cs.virginia.edu/stream/</a><br>

<br>

--<br>

Jonas Hahnfeld, MATSE-Auszubildender<br>

<br>

IT Center<br>

Group: High Performance Computing<br>

Division: Computational Science and Engineering<br>

RWTH Aachen University<br>

Seffenter Weg 23<br>

D 52074  Aachen (Germany)<br>

<a href="mailto:Hahnfeld@itc.rwth-aachen.de">Hahnfeld@itc.rwth-aachen.de</a><br>

<a href="http://www.itc.rwth-aachen.de" rel="noreferrer" target="_blank">www.itc.rwth-aachen.de</a><br>

<br>

<br>

<br>_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

<br></blockquote></div><br></div>