[PATCH] D46130: [NVPTX] Turn on Loop/SLP vectorization

Benjamin Kramer via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Apr 27 01:54:16 PDT 2018


bkramer added inline comments.


================
Comment at: lib/Target/NVPTX/NVPTXTargetTransformInfo.h:53
+  // NVPTX has infinite registers of all kinds.
+  unsigned getNumberOfRegisters(bool Vector) const { return 1; }
+
----------------
hfinkel wrote:
> jlebar wrote:
> > bkramer wrote:
> > > jlebar wrote:
> > > > bkramer wrote:
> > > > > jlebar wrote:
> > > > > > Does 1 have specific meaning?  I don't see this in any of the comments, and that would be a pretty weird API...  (Like, did you mean -1?)
> > > > > 1 is the default of the generic implementation, I just copied that and removed the check for vector. I'm not even sure if anyone ever checks the value or just compares it against zero.
> > > > I see a few places using the actual value returned: LoopStrengthReduce, LoopVectorize...
> > > Right, LoopStrengthReduce calls it with (false), which was returning '1' before and after :)
> > > 
> > > I'm not sure about LoopVectorize, but conservatively limiting its vectorization factor is probably a good thing.
> > OK, I think I see what you're after: You want to make the minimal/safest change that gets vectorization on your testcases?
> > 
> > That sounds good to me, but can we make the comment on "return 1" express this motivation?
> LoopVectorize is using this number to control the amount of interleaving/unrolling it does. The idea being that inverleaving is beneficial until you create too much register pressure (i.e., until you start spilling).
> 
> I don't think that setting this to 1 is a good idea. Perfectly reasonable changes to the loop vectorizer in the future could cause this to completely disable vectorizer. You should set this, I suspect, to the maximum number of registers you can have per thread at full occupancy.
> 
This is intentional. I want to keep LoopVectorizer as conservative as possible for now. I have benchmarks where SLP makes a clear improvement, for LV it's much less clear and it has the potential of causing huge regressions, especially since NVPTX TTI is not yet tuned for it.

Happy to reword the comment more in case that's still unclear and put in a note to revisit making the LV more aggressive. WDYT?


Repository:
  rL LLVM

https://reviews.llvm.org/D46130





More information about the llvm-commits mailing list