[PATCH] D61764: [LV] Suppress vectorization in some nontemporal cases
Warren Ristow via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri May 31 01:05:54 PDT 2019
wristow marked an inline comment as done.
wristow added a comment.
In D61764#1519660 <https://reviews.llvm.org/D61764#1519660>, @wristow wrote:
> In D61764#1517571 <https://reviews.llvm.org/D61764#1517571>, @RKSimon wrote:
> > Would it be possible to add tests where non-temporal load/stores successfully vectorize?
> Glad to see your comment about SSE4A supporting nt-stores at any alignment. With that, I can make an X86 test-case that does vectorize.
Actually, if I understand your SSE4A point correctly, then that doesn't allow vector nt-stores at abrbitrary alignment. So AFAIK, there aren't any vector nt mem-ops on X86, and so for X86, I cannot make a test that successfully vectorizes.
Comment at: lib/Target/X86/X86TargetTransformInfo.cpp:3080
+ if (DataType->isFloatTy() || DataType->isDoubleTy())
+ return ST->hasSSE4A();
> wristow wrote:
> > RKSimon wrote:
> > > SSE4A nt-stores can happen with any alignment, and AFAICT without any perf penalty.
> > I didn't realize that. I'll update the patch, and include a test for it.
> Looking into this, I'm confused... Are you saying (for example) that with SSE4A, `vmovntps` can do an nt-store with a misaligned address? Looking through the docs, I'm seeing a requirement for the address to be aligned.
> Or are you saying (for example) the SSE4A instruction `movntss` (which takes a vector-register operand containing the value to be stored) can take an arbitrary alignment for the memory address? If that's what your point is, then yes I should change the above to allow misaligned `float` and `double` nt-stores. But `movntss` is only storing one `float` element of the vector register (ignoring the other elems), and so it doesn't allow us to vectorize the case. In short, yes I should change that for `float` and `double` nt-stores on SSE4A, but since it doesn't allow us to vectorize, I wonder if I'm misunderstanding your point.
> Or are you saying something else? (Like I said, I'm confused.)
I'm thinking your point must be my second guess above (that `movntss` and `movntsd` can store `float`/`double` non-temporally at an arbitrary boundary). So I've updated the patch to do that.
CHANGES SINCE LAST ACTION
More information about the llvm-commits