[PATCH] D61764: [LV] Suppress vectorization in some nontemporal cases

Tue Jun 4 10:53:05 PDT 2019

wristow marked 3 inline comments as done.
wristow added inline comments.

================
Comment at: include/llvm/Analysis/TargetTransformInfoImpl.h:227
+
+  bool isLegalNTLoad(Type *DataType, unsigned Alignment) { return true; }
+
----------------
RKSimon wrote:
> I realise this is the current default but its almost certainly better to get this return false - @fhahn any comments?
FWIW, I have a similar view.  I was thinking of rather than returning simply true (or false), returning true only if the specified `DataType` is minimally aligned at `Alignment`.  In practice for now (since this is currently only called when checking for vectorization), this will effectively mean it defaults to returning false.  But if these routines were ever used more generally, checking the alignment of `DataType` seems like a sensible default.  I'll wait a bit to see whether @fhahn has a preference.

================
Comment at: lib/Target/X86/X86TargetTransformInfo.cpp:3080
+  if (DataType->isFloatTy() || DataType->isDoubleTy())
+    return ST->hasSSE4A();
+
----------------
RKSimon wrote:
> andreadb wrote:
> > wristow wrote:
> > > RKSimon wrote:
> > > > wristow wrote:
> > > > > wristow wrote:
> > > > > > wristow wrote:
> > > > > > > RKSimon wrote:
> > > > > > > > SSE4A nt-stores can happen with any alignment, and AFAICT without any perf penalty.
> > > > > > > I didn't realize that.  I'll update the patch, and include a test for it.
> > > > > > Looking into this, I'm confused...  Are you saying (for example) that with SSE4A, `vmovntps` can do an nt-store with a misaligned address?  Looking through the docs, I'm seeing a requirement for the address to be aligned.
> > > > > > 
> > > > > > Or are you saying (for example) the SSE4A instruction `movntss` (which takes a vector-register operand containing the value to be stored) can take an arbitrary alignment for the memory address?  If that's what your point is, then yes I should change the above to allow misaligned `float` and `double` nt-stores.  But `movntss` is only storing one `float` element of the vector register (ignoring the other elems), and so it doesn't allow us to vectorize the case.  In short, yes I should change that for `float` and `double` nt-stores on SSE4A, but since it doesn't allow us to vectorize, I wonder if I'm misunderstanding your point.
> > > > > > 
> > > > > > Or are you saying something else?  (Like I said, I'm confused.)
> > > > > I'm thinking your point must be my second guess above (that `movntss` and `movntsd` can store  `float`/`double` non-temporally at an arbitrary boundary).  So I've updated the patch to do that.
> > > > Sorry @wristow I missed your previous question - yes I was referring to SSE4A allowing unaligned scalar float/double nt-stores. Regular (v)movntps still has natural alignment requirements.
> > > > 
> > > > I also raised PR42026 about using movntss/movntsd/movnti to scalarize unaligned vectors, which I think with suitable costs would still allow us to vectorize everything else - vectorizer would create a unaligned vector ntstore ir instruction and we'd scalarize the store in the backend.
> > > Thanks for that explanation, @RKSimon .  I understand much better now.
> > > 
> > > With that, is there still an additional test (for X86) where non-temporal loads/stores are vectorized that is possible?  I think PR42026 is essentially attacking that problem from the other end.  Thinking about it, would a fix for PR42026 obviate the change here?
> > My understanding is that we still want this change regardless of PR42026.
> > I was chatting with Simon about this issue. A fix for PR42026 can be seen as some sort of "last resort" if badly aligned NT instructions reach ISel.
> I think getting this patch in first makes sense.
> 
> I see PR42026 more as a failsafe if anything has managed to create unaligned nt-stores - it doesn't help with nt-loads either.
OK.  Sounds good.

================
Comment at: lib/Target/X86/X86TargetTransformInfo.cpp:3157
+
+  // SSE4A supports nontermporal stores of float and double at arbitrary
+  // alignment.
----------------
RKSimon wrote:
> typo
Thanks.  Will fix before committing.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61764/new/

https://reviews.llvm.org/D61764