[llvm] [AArch64] Add @llvm.experimental.vector.match (PR #101974)

Wed Oct 30 08:25:02 PDT 2024

rj-jesus wrote:

> If only the latter is true then yes seeing representative IR will be very useful. However, if the transformation is generically useful then why not implement that first?

Sorry, maybe I wasn't very clear, but the transformation is in https://github.com/llvm/llvm-project/pull/101976. It does need rebasing to the latest version of the intrinsic, but I was letting the intrinsic settle to hopefully avoid too much back and forth). But I'll get back to it soon.

> I ask "Is enabling vectorisation of the find loop enough to see improved performance? or does that only happen when an operation equivalent to SVE's match instruction is available?

I haven't benchmarked with stock IR, but yes, I expect the latter. With SVE2 MATCH we can get very neat speedups (200x+) in some cases, simply because you can do up to 16x16 byte compares in a single instruction with relatively low latency (at least on Neoverse V2).

I can get the stock IR though if you'd still like to have a look at it?

https://github.com/llvm/llvm-project/pull/101974