[llvm] [GlobalISel] Add multi-way splitting support for wide scalar shifts. (PR #155353)

Thu Aug 28 09:48:33 PDT 2025

aemerson wrote:

> > AMDGPU will never want to use the stack path
> 
> Interesting.
> 
> > I'd rather we do that only as a last resort. Although it's terrible either way, once we legalize through the stack there's no more optimization opportunities.
> 
> Yeah true, although it just looks quite a bit more efficient for large shifts. If we split the original type into n pieces then it is n stores + n 0/1 stores + maybe a ashr + n reloads + the address calculation to figure out where to load from. The DAG version should be more efficient than it is. AFAICT this other strategy is more instructions and grows by O(n^2) as it needs to compare each of the pieces. That is likely why it is so slow for large shifts, if I have that correct.

Yes this multi-way approach is n^2 instructions, so as you say for very large shifts we probably still want the option to go through the stack if its more efficient (although I probably won't get around to doing that any time soon). That said, for the constant shift cases this approach is just O(N).

https://github.com/llvm/llvm-project/pull/155353