[PATCH] D61764: [LV] Suppress vectorization in some nontemporal cases

Andrea Di Biagio via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jun 4 09:50:21 PDT 2019


andreadb accepted this revision.
andreadb added a comment.
This revision is now accepted and ready to land.

I think this patch looks good.
The new TTI hooks looks good, and the change seems conservative enough. But more importantly it fixes the perf issue reported as PR40759.

I'll leave the final decision to Simon. However, from my point of view this patch looks good.



================
Comment at: lib/Target/X86/X86TargetTransformInfo.cpp:3080
+  if (DataType->isFloatTy() || DataType->isDoubleTy())
+    return ST->hasSSE4A();
+
----------------
wristow wrote:
> RKSimon wrote:
> > wristow wrote:
> > > wristow wrote:
> > > > wristow wrote:
> > > > > RKSimon wrote:
> > > > > > SSE4A nt-stores can happen with any alignment, and AFAICT without any perf penalty.
> > > > > I didn't realize that.  I'll update the patch, and include a test for it.
> > > > Looking into this, I'm confused...  Are you saying (for example) that with SSE4A, `vmovntps` can do an nt-store with a misaligned address?  Looking through the docs, I'm seeing a requirement for the address to be aligned.
> > > > 
> > > > Or are you saying (for example) the SSE4A instruction `movntss` (which takes a vector-register operand containing the value to be stored) can take an arbitrary alignment for the memory address?  If that's what your point is, then yes I should change the above to allow misaligned `float` and `double` nt-stores.  But `movntss` is only storing one `float` element of the vector register (ignoring the other elems), and so it doesn't allow us to vectorize the case.  In short, yes I should change that for `float` and `double` nt-stores on SSE4A, but since it doesn't allow us to vectorize, I wonder if I'm misunderstanding your point.
> > > > 
> > > > Or are you saying something else?  (Like I said, I'm confused.)
> > > I'm thinking your point must be my second guess above (that `movntss` and `movntsd` can store  `float`/`double` non-temporally at an arbitrary boundary).  So I've updated the patch to do that.
> > Sorry @wristow I missed your previous question - yes I was referring to SSE4A allowing unaligned scalar float/double nt-stores. Regular (v)movntps still has natural alignment requirements.
> > 
> > I also raised PR42026 about using movntss/movntsd/movnti to scalarize unaligned vectors, which I think with suitable costs would still allow us to vectorize everything else - vectorizer would create a unaligned vector ntstore ir instruction and we'd scalarize the store in the backend.
> Thanks for that explanation, @RKSimon .  I understand much better now.
> 
> With that, is there still an additional test (for X86) where non-temporal loads/stores are vectorized that is possible?  I think PR42026 is essentially attacking that problem from the other end.  Thinking about it, would a fix for PR42026 obviate the change here?
My understanding is that we still want this change regardless of PR42026.
I was chatting with Simon about this issue. A fix for PR42026 can be seen as some sort of "last resort" if badly aligned NT instructions reach ISel.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61764/new/

https://reviews.llvm.org/D61764





More information about the llvm-commits mailing list