[PATCH] D126363: [VPlan, VP] 1/4 Introduce new recipes to support predicated vectorization

Fri May 27 04:47:15 PDT 2022

simoll added a comment.

In D126363#3538646 <https://reviews.llvm.org/D126363#3538646>, @ABataev wrote:

> In D126363#3538612 <https://reviews.llvm.org/D126363#3538612>, @fhahn wrote:
>
>> In D126363#3537438 <https://reviews.llvm.org/D126363#3537438>, @simoll wrote:
>>
>>> In D126363#3537416 <https://reviews.llvm.org/D126363#3537416>, @ABataev wrote:
>>>
>>>> We store the the %evl as the related value for VPValue *EVL using State.set(EVL, %evl, Part) and then get required value using State.get(EVL, Part).
>>>> In this case we can treat EVL similarly to canonical iv, which is not an invariant.
>>>
>>> Ok. If that works then having one global EVL per State defined this way should be fine for us for now.
>>
>> The way VectorTripCount is handled at the moment is a workaround and probably shouldn't act as inspiration. AFAICT we already removed all uses during code-gen that where using `State` to access the vector trip count. If it is needed for code-gen of a recipe, it should be expressed as operand.
>
> Better to treat EVL as CanonicalIV. Yes, I agree that the recipe is better choice here (similar to CaonicalIV) but it requires some extra work because of VPWidenIntOrFpInductionRecipe should depend on EVL. Probably, need to split VPWidenIntOrFpInductionRecipe to a PHI recipe and something like CanonicalIVIncrement, otherwise this dependency prevents it from being vectorized effectively.
>
>> If we need to generate code to compute the EVL, it should be modeled as recipe. If the EVL depends on any other recipes (like the canonical induction), it needs to be a recipe. If all that is needed is an opcode and operands, then it should probably just be a additional opcode in VPInstruction, instead of a new recipe.
>>
>>>>> Here is my suggestion:
>>>>>
>>>>> 1. We get an explicit-vector-length recipe to compute EVL inside the vector loop. And this will be the only recipe we add because..
>>>>> 2. We extend the existing recipes with an (optional) EVL operand. Presence of EVL implies that VP intrinsics are used for widening.
>>>>
>>>> I'm afraid that it will require HUGE(!!!) amount of changes in the Vplan. I assume, there still will be the same recipe/vpvalue for EVL across of all recipes/vpinstructions.
>>>
>>> How? I think there is some kind of misunderstanding here.
>>> There is an existing prototype implementation that uses these three new recipes and a global EVL per vector loop.
>>> The code for the additional recipes is small and - frankly - trivial when you know how to do it. If you use separate recipes, the existing recipes are completely unaffected by this.
>>>
>>> What part of my suggestion makes you think that there would be huge changes? Is it the adding EVL to existing recipes?
>>
>> I think when deciding whether to add new recipes here, a key question is what the differences are to other existing recipes. IIUC the only difference between `VPPredicatedWidenRecipe` and `VPWidenRecipe` modulo the extra mask & EVL operands is during code-gen, right? But fundamentally both still widen an (arithmetic) operation. Whether some elements may be masked out shouldn't really matter for VPlan based analysis at the moment.

I suppose? These are the differences between `VPPredicatedWidenRecipe` and `VPWidenRecipe`, i see:

- Mask & EVL operands. The Mask operand could be optional (which implies a constant all-one mask), EVL is mandatory.
- EVL has significant performance implications on RVV and VE. If you count costing as a VPlan based analysis, then that would be one analysis in favor for having two recipe kinds.
- `VPPredicatedWiden*` recipes always widen to VP intrinsics. `VPWiden*` don't.

>> We already have some precedence here with the `VPWidenMemoryInstructionRecipe`, which has an optional mask. IIUC a `VPWidenRecipe` could have either an additional `Mask` operand or `Mask` & `EVL`, so it should not be too difficult to distinguish between the version when printing the recipe and codegen. This was something I mentioned at the earlier reviews IIRC.
>
> I rather doubt we need to handle EVL same way as Mask.

Juggling mutiple EVLs in one piece of code is not unheard of in VE/vector code. I suspect RVV programmers will eventually pull similar tricks (hardware permitting). You only need the EVL operands for that. The first VPlan-VP vector plan builder can pass in the same EVL for all of them but having EVL operands will give us the flexibility to go further.

>> I think it should also be possible to use predicated instructions without needing EVL , right? The modeling should be flexible enough to not force us to add redundant EVL operands when they are not needed  IMO.
>
> Looks like all vp intrinsics require EVL. Some of them require previous EVL (which, I assume, can be simply a VF?).

Yes, you can effectively disable the EVL in VP Intrinsics by passing in the VF. The current VP infrastructure recognizes that.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D126363/new/

https://reviews.llvm.org/D126363