[PATCH] D61764: [LV] Suppress vectorization in some nontemporal cases
Simon Pilgrim via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jun 4 09:50:20 PDT 2019
RKSimon added inline comments.
================
Comment at: include/llvm/Analysis/TargetTransformInfoImpl.h:227
+
+ bool isLegalNTLoad(Type *DataType, unsigned Alignment) { return true; }
+
----------------
I realise this is the current default but its almost certainly better to get this return false - @fhahn any comments?
================
Comment at: lib/Target/X86/X86TargetTransformInfo.cpp:3080
+ if (DataType->isFloatTy() || DataType->isDoubleTy())
+ return ST->hasSSE4A();
+
----------------
andreadb wrote:
> wristow wrote:
> > RKSimon wrote:
> > > wristow wrote:
> > > > wristow wrote:
> > > > > wristow wrote:
> > > > > > RKSimon wrote:
> > > > > > > SSE4A nt-stores can happen with any alignment, and AFAICT without any perf penalty.
> > > > > > I didn't realize that. I'll update the patch, and include a test for it.
> > > > > Looking into this, I'm confused... Are you saying (for example) that with SSE4A, `vmovntps` can do an nt-store with a misaligned address? Looking through the docs, I'm seeing a requirement for the address to be aligned.
> > > > >
> > > > > Or are you saying (for example) the SSE4A instruction `movntss` (which takes a vector-register operand containing the value to be stored) can take an arbitrary alignment for the memory address? If that's what your point is, then yes I should change the above to allow misaligned `float` and `double` nt-stores. But `movntss` is only storing one `float` element of the vector register (ignoring the other elems), and so it doesn't allow us to vectorize the case. In short, yes I should change that for `float` and `double` nt-stores on SSE4A, but since it doesn't allow us to vectorize, I wonder if I'm misunderstanding your point.
> > > > >
> > > > > Or are you saying something else? (Like I said, I'm confused.)
> > > > I'm thinking your point must be my second guess above (that `movntss` and `movntsd` can store `float`/`double` non-temporally at an arbitrary boundary). So I've updated the patch to do that.
> > > Sorry @wristow I missed your previous question - yes I was referring to SSE4A allowing unaligned scalar float/double nt-stores. Regular (v)movntps still has natural alignment requirements.
> > >
> > > I also raised PR42026 about using movntss/movntsd/movnti to scalarize unaligned vectors, which I think with suitable costs would still allow us to vectorize everything else - vectorizer would create a unaligned vector ntstore ir instruction and we'd scalarize the store in the backend.
> > Thanks for that explanation, @RKSimon . I understand much better now.
> >
> > With that, is there still an additional test (for X86) where non-temporal loads/stores are vectorized that is possible? I think PR42026 is essentially attacking that problem from the other end. Thinking about it, would a fix for PR42026 obviate the change here?
> My understanding is that we still want this change regardless of PR42026.
> I was chatting with Simon about this issue. A fix for PR42026 can be seen as some sort of "last resort" if badly aligned NT instructions reach ISel.
I think getting this patch in first makes sense.
I see PR42026 more as a failsafe if anything has managed to create unaligned nt-stores - it doesn't help with nt-loads either.
================
Comment at: lib/Target/X86/X86TargetTransformInfo.cpp:3157
+
+ // SSE4A supports nontermporal stores of float and double at arbitrary
+ // alignment.
----------------
typo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D61764/new/
https://reviews.llvm.org/D61764
More information about the llvm-commits
mailing list