[PATCH] D113003: [LoopVectorize] Add support for tail folding using scalable vectors

Tue Nov 2 06:58:29 PDT 2021

david-arm added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5881
-  // low, in which case it makes sense not to use scalable vectors.
-  if (MaxFactors.ScalableVF.isVector())
     MaxFactors.ScalableVF = ElementCount::getScalable(0);
----------------
SjoerdMeijer wrote:
> Hi @david-arm , thanks for clarifying so far, that has helped. Moving the question about profitability to here, the place where it is probably relevant. Also note that I am asking a bunch of question just to reload to memory how things work here (and I am new to the SVE/scalable angle here). 
> 
> Before we discuss the actual change, I don't think I understand this original comment + code here to be honest. The first sentence:
> 
> > For scalable vectors, don't use tail folding as this is currently not yet supported. 
> 
> suggest to me that we want to return `CM_ScalarEpilogueAllowed` in `getScalarEpilogueLowering`. And I don't know what the affect is of setting `MaxFactors.ScalableVF` to 0 below, but I guess that has somehow the affect?
> 
> To me it feels like that we do make a profitability call here "in the middle of something", which should be moved to a different place. I can also imagine that this is highly target dependent/specific, further supporting this. 
> 
> I do remember though that there is a bit of an ordering problem: the decision to tail-fold or not is taken quite early, while not all information that we ideally would like to have are not yet available. Not sure that is the case here. But I guess that for now the decision "if scalable vector, then don't tail-fold" could be taken quite early. But yeah, I might be missing something here, so please enlighten me. :)
> 
So when this code was originally added tail-folding just crashed for scalable vectors because a lot of the code related to tail-folding assumed fixed-width vectors. We've been trying to fix a lot of these issues and now we've got to the point where it is at least possible to fold the tail for most loops. However, it is still experimental for scalable vectors - scalable vectorisation has not yet been enabled by default for the vectoriser. Even after this patch lands there is still at least one known issue that we intend to fix related to tail-folding + scalable vectors. This is the reason why tail-folding is still hidden behind a flag for scalable vectors. If we start to allow tail-folding to be automatically enabled for low trip counts or when optimising for code size then we will be knowingly exposing users to these issues.

I do take your point about profitability and realise the original comment also suggested something related to profitability, but the aim of this patch is simply to add functionality that is not enabled by default. Perhaps I can change the comment to explain that we want to make this feature stable first for scalable vectors, then move on to questions of profitability?

To answer your other question about `MaxFactors.ScalableVF` it basically disables scalable vectorisation entirely if `computeMaxVF` has decided that we should use tail-folding for low trip counts or reducing code size. However, we are still be able to use fixed-width vectorisation.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113003/new/

https://reviews.llvm.org/D113003