[llvm] [LegalizeTypes] Handle non byte-sized elt types when splitting INSERT/EXTRACT_VECTOR_ELT (PR #93357)

Tue May 28 08:22:07 PDT 2024

bjope wrote:

> Is the nearest byte really what we want here? We can widen to any type we want, and on most targets, storing an `<N x i32>` is going to be a lot more efficient than storing an `<N x i24>`.

It is a bit hard to understand what is best for every target (unless perhaps scanning for some legal type, if such a type exists).
Some reasons for not picking a larger element type would be:
- If the vector is `<N x i24>` there is no padding, and one could just memcpy it to the stack if the source vector is in memory. It seems a tiny bit weird to promote `<N x i23>` to `<N x i32>` if we do not promote `<N x i24>` to `<N x i32>`. But nothing here says that we would promote elements that are byte-sized, so it is possible to promote to next power-of-two.
- If for example i128 isn't legal, then we would end up wasting lots of stack etc if for example extending <256 x i65> to <256 x i128>.
- Our downstream target can store things like i24, so using getRoundIntegerType would both introduce an unneeded extend, but also wasting stack space. That might ofcourse be an uncommong thing when it comes to our target, and then we should add custom lowering for our target instead of making that the default. One problem with that is that one can't say that <4 x i23> should be custom legalized since that isn't an existing MVT type. So that complicates things a bit if I would pick a power-of-two type blindly without checking for legal types to load/store.
- Picking larger types results in touching more memory, which also might be negative (memory bandwidth, dirtying caches, etc).  
- These legalization has been broken for years as far as I can tell. If performance (and correctness) is an issue I guess most targets are doing custom lowering already. My goal here was just to do something simple to avoid crashes/miscompiles in case this code for example is triggered by IR fuzzy testing etc.

To summarize. How to optimize might be target specific.
Then the question is if we want to add more logic to the common legalization in this case, or just keeping it simple.  Just picking a "rounded integer type" instead of "store sized integer type" is still a simple solution. I'm just not sure if it is better in general or not.

> 
> Instead of expanding immediately, should we just construct an EXTRACT_VECTOR_ELT with the promoted vector type, and legalize that? That might allow custom legalization to kick in in certain cases.

I don't really see the point in giving the target a second chance for custom legalization. A target could just promote the input vector elements itself if that is feasible when doing CustomLowerNode the first time. Otherwise we need to speculate about what element size that would be feasible for the current target, to match whatever types the custom lowering handles. And it will be hard to pick a type that is best for all targets (unless using a target hook just to please custom lowering...).

https://github.com/llvm/llvm-project/pull/93357