[PATCH] D23646: Generalize strided store pattern in interleave access pass

Alina Sbirlea via llvm-commits llvm-commits at lists.llvm.org
Fri Aug 19 13:59:25 PDT 2016


asbirlea added a comment.

Hi Matt,

Thanks for looking to review this. Please find my answers below.

In https://reviews.llvm.org/D23646#521091, @mssimpso wrote:

> Hi Alina,
>
> I think I understand this, but I just want to be sure I get how this differs from what we currently have before going further. Currently, we only match [x, y, ..., z, x+1, y+1, z+1, ...] where each y-x and each z-y equals the number of sub elements for the given factor. Or said another way, if I create a list or all the x's followed by all the y's and then all the z's, the entire list would be consecutive. With your path, the only requirement is that each sub-list be consecutive. Is this right?


That's right. Also, from my understanding, x is always 0. So all elements form a consecutive sublist which always starts at 0.
My first approach was actually to generalize this just to add a prefix to remove the "starts with 0" restriction and a more general stride that allowed gaps. But this still didn't cover all the testcases I came across, such as the example I added in "store_general_mask_factor4".

To answer your question below, the usecases I'm looking at are generated by Halide (https://github.com/halide/Halide).
Halide generates LLVM IR and relies on its optimization pipeline and lowering, but they need to generate explicit intrinsics (including strided loads and stores) for arm and aarch64, because their patterns are not lowered to intrinsics by LLVM.
Since this approach was taken before the interleaved-access pass was added, it's quite understandable, but LLVM is more powerful now and I'm trying to make use of this, and in the process, cover the cases missing in LLVM.
For example, for strided loads the interleaved-access pass does cover the code patterns generated by Halide, so the "custom" intrinsic code generation in Halide will soon be removed. My goal is to improve the pass to make this happen for the stores as well.
The tests I will add are actually simplified versions of what Halide is generating.

> The current approach was designed to match the shuffle patterns produced by the loop vectorizer. I'm curious to know where we are generating these more general patterns. Have you run across some code examples?

> 

> Also, another high level comment before I start looking at the details: you'll want to include some IR test cases as well (to be run with opt instead of llc).


Agreed, the plan is to add more tests, including IR tests.

> Matt.





https://reviews.llvm.org/D23646





More information about the llvm-commits mailing list