[PATCH] D61764: [LV] Suppress vectorization in some nontemporal cases

Tue May 28 16:49:57 PDT 2019

wristow marked an inline comment as done.
wristow added inline comments.

================
Comment at: lib/Target/X86/X86TargetTransformInfo.cpp:3080
+  if (DataType->isFloatTy() || DataType->isDoubleTy())
+    return ST->hasSSE4A();
+
----------------
wristow wrote:
> RKSimon wrote:
> > SSE4A nt-stores can happen with any alignment, and AFAICT without any perf penalty.
> I didn't realize that.  I'll update the patch, and include a test for it.
Looking into this, I'm confused...  Are you saying (for example) that with SSE4A, `vmovntps` can do an nt-store with a misaligned address?  Looking through the docs, I'm seeing a requirement for the address to be aligned.

Or are you saying (for example) the SSE4A instruction `movntss` (which takes a vector-register operand containing the value to be stored) can take an arbitrary alignment for the memory address?  If that's what your point is, then yes I should change the above to allow misaligned `float` and `double` nt-stores.  But `movntss` is only storing one `float` element of the vector register (ignoring the other elems), and so it doesn't allow us to vectorize the case.  In short, yes I should change that for `float` and `double` nt-stores on SSE4A, but since it doesn't allow us to vectorize, I wonder if I'm misunderstanding your point.

Or are you saying something else?  (Like I said, I'm confused.)

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61764/new/

https://reviews.llvm.org/D61764