[llvm] [AArch64] Add @llvm.experimental.vector.match (PR #101974)

Thu Nov 7 08:04:05 PST 2024

paulwalker-arm wrote:

Thanks for the IR @rj-jesus and thanks for your patience.  From a pure "can we reliably match this IR" point of view I'd still answer yes.  However, I see your use-case requires loading a needle vector, which adds additional complexity to track where the needles are coming from in order to reuse the original vector. Given this niggle and the significant performance loss that would result from failing to match the sequence, I'm happy to agree there's enough value in having a dedicated intrinsic.  Given the potential size of the IR sequence replaced I guess there may also be compile time benefits and it'll certainly be easier to cost model.

Not related to this, but looking at the IR I notice you're effectively doing a fixed length load into a scalable vector for the needle.  I just wondered if you have considered having the main loop load directly into a fixed length vector and then have a tail loop that requires a masked load.  Doing this would allow you to emit `ld1rq` instructions where the result can be used directly by the `match` instruction. 

https://github.com/llvm/llvm-project/pull/101974