[PATCH] D96021: [LoopVectorize] NFC: Move UserVF feasibility checks to separate function.

Thu Mar 4 14:19:17 PST 2021

fhahn added a comment.

In D96021#2579034 <https://reviews.llvm.org/D96021#2579034>, @sdesmalen wrote:

> Hi @fhahn, thanks for looking into these two patches.

Hi, sorry for the delay.

> Today the code in computeFeasibleMaxVF is roughly structured as:
>
>   if (Need to care about UserVF) {
>     // Validate, clamp or fall back to fixed-width or scalars.
>     // Meanwhile, giving various UserVF-specific remarks/debug output.
>   } else
>     // Determine MaxFeasibleVF some other way.
>
> Because the function has gotten so big and have practically non-overlapping code paths, I don't see any downside in splitting this code into separate functions to handle their respective cases.

Yes that's indeed a problem, but I think that was mostly a consequence of adding scalable support for UserVFs only. Before it was much simpler IIRC and structured similar to outlined earlier (compute max bound, apply to UserVF if present otherwise apply bound when computing max VF).

> I find the code in D96997 <https://reviews.llvm.org/D96997> quite difficult to follow. Not necessarily because of the changes you made, I already find the code difficult to follow for the fixed-width case alone, but adding the scalable case just adds to the complication. Things like:
>
>   if (some condition specific to scalable VFs) {
>     :
>     ScalableMaxVF = ElementCount::getScalable(0)
>     UserVF = ElementCount::getScalable(0)
>   }
>   
>   :
>   
>   if (some other condition specific to scalable VFs) {
>     :
>     ScalableMaxVF = ElementCount::getScalable(0)
>   }
>
> make me weary for example. I think having the code split up makes the code a lot easier to follow than having the combination {UserVF, auto-select} x {Fixed, Scalable} in one function. After D96021 <https://reviews.llvm.org/D96021> and D96022 <https://reviews.llvm.org/D96022>, practically all code-paths are the same for both the fixed- and scalable case.

Agreed, that was not ideal, but I think not necessarily due to the new structure, but because it inlined the upper bound computation for scalable VFs, which is much more verbose than for fixed vectors (BTW I think the extra debug messages are great!). I did a quick pass over the patch to move the scalable VF upper bound computation to a separate function, so those separate checks should be gone now.

In D96021#2601389 <https://reviews.llvm.org/D96021#2601389>, @sdesmalen wrote:

> Hi @fhahn, following your feedback, I've refactored the patches in this series a bit to move out some common calculations from `getFeasibleUserVF` and `computeFeasibleMaxVF`. I've also tried to implement your idea to start off with a very wide vector, that gets subsequently limited by: loop dependence-distance, vector-register-width, iteration count, etc. You'll see in D96023 <https://reviews.llvm.org/D96023> that the code is now relatively straight-forward in how it limits the MaxVF, and how it works for both fixed- and scalable vectors. You'll also notice that `computeFeasibleMaxVF` no longer takes an explicit `bool ComputeScalableMaxVF` flag.
>
> Hopefully this is more along the lines that you had in mind!

Thanks for the update, I just had a look. One thing I am not sure about is whether we want to emit the same debug messages & remarks for scalable vectors when we just compute the max scalable VF, without UserVF? Personally I think it would be helpful to emit them independently of whether UserVF is used or not, if scalable vectorization is enabled. For example, if there's a dependence that prevents scalable vectorization due to the MaxVScale computation, I think it would make sense to display the remark/debug output saying this is causing scalable vectorization to be disabled.

As alluded to earlier, I am not sure if splitting up the logic in `getFeasibleUserVF` and `computeFeasibleMaxVF` is ideal. The way I see it, we have to apply the same 'rules'/'restrictions' to both the UserVF and/or the max VF we compute. With that in mind, the UserVF handling should be really simple: we just have to check if it is within the upper bound. For scalable vectors, we also have the fallback. But this may just be temporary, because once we also pick a max scalable VF, we can instead proceed with ignoring the UserVF, same as we do for fixed vectors.

I think the main advantages of the approach outlined earlier are:

- There's one place we apply the legality restrictions to the upper bound, which is then used for both the computed max VF and UserVF. I think the advantage here is that it is clearer from the code that the same rules apply in both cases (max VF & UserVF) and if we need to add a new constraint, there's a single place to add it.
- The 'clamping' is done by a simple minimum function, whereas `clampFeasibleMaxVF` contains scalable vector specific code and also seems slightly more complicated (e.g. needs an extra optional out parameter)
- The same debug messages &Remarks are generated when scalable vectorization is enabled.
- Maybe a bit less indirection/complex function calls.

In the end, the code is not vastly different and I think there are no absolute answers here. I hope the above explains the direction from which I am looking at this a bit better than I did initially.

I updated D96997 <https://reviews.llvm.org/D96997> a bit to better sketch out how it could integrate with computing scalable max VFs. It's still not too polished, it should mainly illustrate the overall structure and there's definitely lots of small things that could be improved further.

Another thing I am not entirely sure is the reduction checks. For now it seems like for AArch64 the VF is not really used, so in the patch it just passes any scalable VF. not sure if that might change in the future, but I think I saw in one of the comments mentioning that we might to remove VFs after computing the max VF anyways.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96021/new/

https://reviews.llvm.org/D96021