[PATCH] D54606: [AMDGPU] Convert insert_vector_elt into set of selects

Fri Nov 16 10:52:24 PST 2018

rampitec added a comment.

In https://reviews.llvm.org/D54606#1301265, @rampitec wrote:

> In https://reviews.llvm.org/D54606#1301250, @nhaehnle wrote:
>
> > In https://reviews.llvm.org/D54606#1301212, @rampitec wrote:
> >
> > > > However, why does code with undef vectors look so bad? For example, in `float4_inselt`, the fact that the initial vector is undef should allow us to just store a splat of 1.0.
> > >
> > > Yes, I noticed that too. That needs to be a separate optimization. As far as I understand "insert_vector_element undef, %var, %idx" should not even come to this point. It needs to be replaced by build_vector (n x %var) regardless of the thresholds and heuristics I am using, e.g. earlier (higher in the same function I think).
> >
> >
> > I disagree. When we end up using scratch memory for a vector, `build_vector (n x %var)` would imply n stores, while `insert_vector_element undef, %var, %idx` implies only 1 store, so doing the transform seems like a pretty terrible idea in general. I think exploiting undef is really specific to what you're doing here and should therefore be done here as well.
>
>
> Hm... Ok, I will add it here.

On the second though I still believe it shall be done elsewhere, in the common DAG combiber. That is the DAG this code produces:

    t28: f32 = select_cc t19, Constant:i32<0>, ConstantFP:f32<1.000000e+00>, undef:f32, seteq:ch
    t30: f32 = select_cc t19, Constant:i32<1>, ConstantFP:f32<1.000000e+00>, undef:f32, seteq:ch
    t32: f32 = select_cc t19, Constant:i32<2>, ConstantFP:f32<1.000000e+00>, undef:f32, seteq:ch
    t34: f32 = select_cc t19, Constant:i32<3>, ConstantFP:f32<1.000000e+00>, undef:f32, seteq:ch
  t35: v4f32 = BUILD_VECTOR t28, t30, t32, t34

and then:

      t40: i1 = setcc t19, Constant:i32<0>, seteq:ch
    t41: f32 = select t40, ConstantFP:f32<1.000000e+00>, undef:f32
      t42: i1 = setcc t19, Constant:i32<1>, seteq:ch
    t43: f32 = select t42, ConstantFP:f32<1.000000e+00>, undef:f32
      t44: i1 = setcc t19, Constant:i32<2>, seteq:ch
    t45: f32 = select t44, ConstantFP:f32<1.000000e+00>, undef:f32
      t46: i1 = setcc t19, Constant:i32<3>, seteq:ch
    t47: f32 = select t46, ConstantFP:f32<1.000000e+00>, undef:f32
  t35: v4f32 = BUILD_VECTOR t41, t43, t45, t47

So it looks like an appropriate optimization is:

  select cc, x, undef => x

https://reviews.llvm.org/D54606