[PATCH] D102437: [LV] NFCI: Move implementation of isScalarWithPredication to LoopVectorizationLegality

Tue Jun 1 01:50:18 PDT 2021

fhahn added a comment.

In D102437#2777338 <https://reviews.llvm.org/D102437#2777338>, @sdesmalen wrote:

> In D102437#2776569 <https://reviews.llvm.org/D102437#2776569>, @fhahn wrote:
>
>> I am not sure if it was very clear from my earlier comment, I was referring to scalarization + predicated block generation done directly by LV.
>>
>> For regular vectors, LV does not rely on the backend for scalarization, but it instead creates a loop that iterates over each index for the VF and in the loop conditionally executes the scalar version of the requested instruction and packs the result in a vector. For fixed width VFs the loop is not explicit in the generated IR, because it is just unrolled for the VF. Is there anything fundamental preventing LV to create a similar explicit loop over the scalable VF at runtime? Figuring out an accurate cost would be expensive of course, but I am curious if such scalarization would be possible.
>>
>> Not sure if I am missing something, but after a quick glance it seems to me that we should have all required pieces in LLVM IR to write such a loop and lower it https://llvm.godbolt.org/z/rhqh9fz8b
>
> Thanks for clarifying. We have prototyped that approach before and our experiments showed it was never beneficial to use scalarization within a vectorized loop with scalable vectors, so we decided to keep the vectorizer simple and disallow scalable auto-vec when scalarization is required. In the case of operations like SDIV, our plan is to emit a `llvm.vp` intrinsic which can then be ISel'ed natively (using a native instruction w/ predication). It seemed better to disallow it first to avoid any vectorization failures, and then bring up the implementation to support this case.

I see, thanks for elaborating! I am not sure how much extra work generating such loops would be, but it seems like disallowing scalarization requires quite a bit of extra complexity as a number of recent patches that introduce additional restrictions show.

The extra checks also seem introduce various TTI checks that seem to mirror the interfaces that compute cost. Do you know how this will impact other targets with scalable vectors? Do you know if scalarization is similarly expensive on other scalable targets? The naive predication/scalarization is also very expensive on most fixed vector targets and almost never profitable, but it simplifies the rest of the code, because LV can generate vector IR for almost all instructions mechanically.

>> My comment is only referring to the 'moving to LVL' part of the patch. Not sure if anything bit of the untangling relies on moving the code?
>
> I guess it could equally live in CM, but my reasoning to move it was more that it has nothing to do with the cost-model, so moving it to LVL avoids it getting polluted with other CM-specific code in the future.

>From the discussion above, it seems to me that it is still more of a cost question rather than a legality question, as IIUC from the discussion, they could be vectorized by scalarizing via a loop? Also, it looks like all uses of the functions moved to LVL are in the cost-model. The replacement of `isLegalMaskedScatter` seems also unrelated and could be submitted separately.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102437/new/

https://reviews.llvm.org/D102437