[PATCH] D54606: [AMDGPU] Convert insert_vector_elt into set of selects

Fri Nov 16 10:13:51 PST 2018

arsenm added a comment.

In https://reviews.llvm.org/D54606#1301250, @nhaehnle wrote:

> In https://reviews.llvm.org/D54606#1301212, @rampitec wrote:
>
> > > However, why does code with undef vectors look so bad? For example, in `float4_inselt`, the fact that the initial vector is undef should allow us to just store a splat of 1.0.
> >
> > Yes, I noticed that too. That needs to be a separate optimization. As far as I understand "insert_vector_element undef, %var, %idx" should not even come to this point. It needs to be replaced by build_vector (n x %var) regardless of the thresholds and heuristics I am using, e.g. earlier (higher in the same function I think).
>
>
> I disagree. When we end up using scratch memory for a vector, `build_vector (n x %var)` would imply n stores, while `insert_vector_element undef, %var, %idx` implies only 1 store, so doing the transform seems like a pretty terrible idea in general. I think exploiting undef is really specific to what you're doing here and should therefore be done here as well.

Does this actually happen? I thought we custom lowered all of the vector operations

https://reviews.llvm.org/D54606