[PATCH] D88819: [LV] Support for Remainder loop vectorization

Thu Oct 15 03:45:32 PDT 2020

fhahn added a comment.

In D88819#2314896 <https://reviews.llvm.org/D88819#2314896>, @mivnay wrote:

> In D88819#2314545 <https://reviews.llvm.org/D88819#2314545>, @fhahn wrote:
>
>> In D88819#2311917 <https://reviews.llvm.org/D88819#2311917>, @mivnay wrote:
>>
>>> In D88819#2311678 <https://reviews.llvm.org/D88819#2311678>, @fhahn wrote:
>>>
>>>> Did you consider supporting this naturally by just having LV re-visit the newly created remainder loops, i.e. remember the created remainder loops and add them to the top-level worklist   https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp#L8587 ? We would need to make sure we do not visit them repeatedly, but overall we should be able to achieve the same goal, but without adding extra complexity to the vectorizer.
>>>
>>> Blindly calling the vectorizer for the loop again is not optimal. The current change does that but at lower abstraction level. The majority of the changes are about setting the right overall CFG structure. Example: It is unnecessary to execute runtime checks twice, etc. "struct EpilogVectorLoopHelper" is just the carrier of information from the original vector loop generation to the epilog vector loop generation. Also, InnerLoopVectorizer doesn't expose the vector loop CFG structure to it's users. Fixing the CFG structure at the higher abstraction level exposes this class completely.
>
>
>
>> Is the main motivation avoiding re-doing the runtime checks? I think we might be able to annotate the the remainder loop with `noalias` metadata, if we emit memory runtime checks, which should avoid generating them again for the remainder (and might be beneficial even if we do not vectorize the remainder).
>
> Yes, the code changes inside the InnerLoopVectorizer are done to get the various Values (like, ResumeValue) and Blocks(like, MiddleBlock) easily. If we do the vectorizations independently, we would need a separate analysis to identify the loops, CFG, metadata, llvm::Value, etc.

I am not sure I follow here. LoopVectorize preserves LoopInfo, so I think after `LoopVectorizePass::processLoop` it should be easy to get the `Loop *` pointer for the remainder loop? And that should be all that is needed to process it again? We might also need a way to instruct ILV to choose a smaller VF for the remainder, but we might just be able to use the vectorization metadata to do so. It should also be relatively straight-forward to skip runtime check generation in the epilogue case.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88819/new/

https://reviews.llvm.org/D88819