[llvm] [Vectorize] Vectorization for __builtin_prefetch (PR #66160)

Mon Jan 22 02:15:01 PST 2024

m-saito-fj wrote:

@davemgreen  Thank you for your comment.

> Is the main idea to make sure that we do vectorize the loop, or are the prefetches important for performance?

My main goal is to vectorize the loop containing __builtin_prefetch.  Certainly, there are several ways to vectorize a loop containing builtin_prefetch. I implemented vectorization using the SVE instruction, but I think it would be better to implement other vectorization support and options to choose between those methods.

> I think it might make sense to have an initial patch that introduces the intrinsics+langref and adds the codegen support for them - both for genetic targets and for SVE. Once that is in it would make adding tests for vectorization easier.

[Implementation](https://github.com/llvm/llvm-project/compare/main...m-saito-fj:llvm-project:vectorization-builtin-prefetch-aarch64)
The above implementation consists of three commits. Each of the commits and their additional features consist of the following

1. [Vectorize] Vectorization for __builitin_prefetch
    1.1. Add new vector prefetch intrinsic + langref
    1.2. Addition of prefetch intrinsic vectorization process to LoopVectorize 
2. [CodeGen] CodeGen for masked_prefetch and masked_gather_prefetch
    2.1. Codegen support for vector prefetch intrinsic
3. [AArch64][Vectorize] Vectorization for __builtin_prefetch for AArch64
    3.1. Allow vectorization by LoopVectorize in AArch64
    3.2. Addition of Lowering process in CodeGen for AArch64

Does this mean it would be better to aim to merge 1.1, 2.1 and 3.2 in one patch first?

https://github.com/llvm/llvm-project/pull/66160