[llvm] [IA] Add support for [de]interleave{3,5,7} (PR #139373)

Luke Lau via llvm-commits llvm-commits at lists.llvm.org
Mon May 12 17:04:37 PDT 2025


lukel97 wrote:

> Personally I prefer adding (de)interleave6 but I believe there was already a discussion on this a while back but I don't remember why people decided against it then.

Yeah, I've been rereading some of the old threads in https://github.com/llvm/llvm-project/pull/89018#issuecomment-2370814217

It does seem like there was some discussion about revisiting this after the initial patch, and I think now that we've run into some complexity with trying to extend it to handle RISC-V's factors I think it's a good time to re-evaluate.

I've got a patch that adds [de]interleave6 as well as [de]interleave{4,8}, that I hope to post soon to get some feedback on.

The gist is that the ability to support arbitrary interleave factors seems premature, since I'm not aware of any hardware that can take advantage of anything beyond a factor of 8, so the simplest setup would just be to have a dedicated intrinsic for each factor:

- Unlike shufflevectors with fixed-length vectors, detecting an interleave pattern isn't as trivial
- Unlike shufflevectors, there's no optimisation that happens on [de]interleave2 intriniscs
- We already have intrinsics for 2,3,5 and 7, so by avoiding 4,6 and 8 we're not really saving much

The eventual plan would be to move from this state of affairs:

### Fixed-length vectors:
Loop vectorizer: emitted as a series of strided shufflevectors, one for each factor
InterleavedAccessPass: can lower any factor of shufflevectors to target intrinsics

### Scalable vectors:
Loop vectorizer: Only powers of 2 supported, emitted as interleaves of [de]interleave2 intrinsics
InterleavedAccessPass: Only matches powers of 2 interleaves of [de]interleave2 intrinsics

To this:

### Fixed-length vectors:
Loop vectorizer: emitted as a series of strided shufflevectors, one for each factor
InterleavedAccessPass: can lower any factor of shufflevectors to target intrinsics

### Scalable vectors:
Loop vectorizer: All factors up to 8 supported, emitted as single [de]interleaveN intrinsic
InterleavedAccessPass: Matches all factors up to 8 of [de]interleaveN intrinsics

And to get there, I'm imagining the following steps:

- We add intrinsics so we now have [de]interleave{2,3,4,5,6,7,8}
- To handle the case where an interleave somehow doesn't get lowered into a ld4 intrinsic, the "interleave of interleaves" expansion from the loop vectorizer into AArch64's VECTOR_[DE]INTERLEAVE lowering
- Teach InterleaveAccessPass to handle the new intrinsics: trivial
- Teach the loop vectorizer to just emit a single intrinsic instead
- Remove `getVectorInterleaveFactor`/`getVectorDeinterleaveFactor` in InterleaveAccessPass

If we do end up wanting to support even higher interleave factors > 8, e.g. https://github.com/llvm/llvm-project/pull/89018#issuecomment-2061806386, then we could return and address this later. Since at the moment, it doesn't look like we can lower a factor 8 interleave into a ld4s + ld2 on AArch64 anyway.

WDYT?

https://github.com/llvm/llvm-project/pull/139373


More information about the llvm-commits mailing list