[llvm] [LV] Add initial support for vectorizing literal struct return values (PR #109833)

Mon Nov 11 14:58:03 PST 2024

fhahn wrote:

> I'm concerned that the multiple defs is not the correct approach, as it does not seem to mesh well with existing assumptions in the LV.
> 
> The current approach requires a few changes to allow the wide struct types. The largest is replacing `packScalarIntoVectorValue` with `packScalarIntoWideValue`, but other than that it mostly can reuse the existing code paths.
> 
> I've been trying to implement the multiple defs solution (and was able to get a minimal version working in #112402), but there are a few issues, and I'm not sure I see the benefits.
> 
> The two main issues I'm hitting are assumptions things are `VPSingleDefRecipe` and assumptions that instructions are scalarisable (even with an invalid cost). At least one of these needs to be fixed for a working implementation. I think the possible fixes are to make `VPReplicateRecipe` work for multiple defs, add a new a new recipe like `VPMultipleDefReplicateRecipe` (to avoid refactoring the single-def assumptions), or make vectorization when scalarization may happen illegal. That last one seemed a little awkward since in loop vectorization legality I don't have the `VF`, and later even with an `invalid` cost, if it's the only option it'll try to scalarize it.
> 

Just to make sure I am understanding correctly, this is specifically about VPReplicateRecipe and assumptions that it has a single def? Most transforms/analsyses should handle multi-defs ( as they have to deal with interleave recipes).

The main benefits IMO would be to more easily support additional variants of calls producing multiple results (e.g. some sincos implementations return only the first result in a vector and the other result in memory, or could return a vector with results interleaved) and have more accurate/consistent modeling in line with other cases that produce multiple defs (VPInterleaveRecipe)

> The reason I ask is because we've had to disable vectorization for loops with sincos and this patch helps re-enable the lost capability. If modelling this without widening extractvalue is advantageous we should of course do so, but I'd prefer to take the refactoring off the critical path, especially since @MacDue demonstrated this is non-trivial.

Oh I didn't know that, I thought we didn't support vectorizing loops with sincos previously?

I agree that the refactoring shouldn't hold things up for too long. I'll take another look at the trade-offs (thanks @ MacDue for trying!), but it will probably take until end of week

https://github.com/llvm/llvm-project/pull/109833