[llvm] [SLP]Reduce number of alternate instruction, where possible (PR #123360)

Wed Jan 22 04:15:02 PST 2025

alexey-bataev wrote:

> > > I'm not clear whether this patch makes it easier or harder to remove the main-alt opcode kludge in the future and allow general splitting - copyable elements, more than 2 subnodes, non-matching instruction types (e.g. binop and intrinsic).
> > 
> > 
> > 
> > 1. Main/alt opcode cannot be removed completely. X86 has instruction addsub, which effectively supports alt/main vectorization, plus too small alt/main combinations (<2) lead to inefficient vectorization.
> > 2. Copyable elements vectorization will require some other stuff (possibly like adding pseudo-instructions for correct scheduling). This is the work I plan for the next patch.
> > 
> > > I guess I'm asking - is there a larger plan for this or is it focused on improving existing main-alt opcode performance?
> > 
> > 
> > Currently, this is mostly all we can do about main/alternate vectorization.
> 
> Also, as the first step to improve alt/main vectorization (and copyable elements), one of our team members works on #112181. It has some issues with compile time, we still investigating how we can improve it

Just want to clarify two points here.
1. If number of alternate opcodes <(or equal?)3, we can use #112181 and copyable elements support, because nodes with 1 element are not vectorized and with <=(?) 3 might be not very effective from the perf point of view (this requires some extra investigation).
2. In other cases, split node vectorization is much better, since it allows to reduce register pressure and replaces wide (multi-) register operations with smaller ones

https://github.com/llvm/llvm-project/pull/123360