[PATCH] D54606: [AMDGPU] Convert insert_vector_elt into set of selects
Stanislav Mekhanoshin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Nov 16 10:52:24 PST 2018
rampitec added a comment.
In https://reviews.llvm.org/D54606#1301265, @rampitec wrote:
> In https://reviews.llvm.org/D54606#1301250, @nhaehnle wrote:
>
> > In https://reviews.llvm.org/D54606#1301212, @rampitec wrote:
> >
> > > > However, why does code with undef vectors look so bad? For example, in `float4_inselt`, the fact that the initial vector is undef should allow us to just store a splat of 1.0.
> > >
> > > Yes, I noticed that too. That needs to be a separate optimization. As far as I understand "insert_vector_element undef, %var, %idx" should not even come to this point. It needs to be replaced by build_vector (n x %var) regardless of the thresholds and heuristics I am using, e.g. earlier (higher in the same function I think).
> >
> >
> > I disagree. When we end up using scratch memory for a vector, `build_vector (n x %var)` would imply n stores, while `insert_vector_element undef, %var, %idx` implies only 1 store, so doing the transform seems like a pretty terrible idea in general. I think exploiting undef is really specific to what you're doing here and should therefore be done here as well.
>
>
> Hm... Ok, I will add it here.
On the second though I still believe it shall be done elsewhere, in the common DAG combiber. That is the DAG this code produces:
t28: f32 = select_cc t19, Constant:i32<0>, ConstantFP:f32<1.000000e+00>, undef:f32, seteq:ch
t30: f32 = select_cc t19, Constant:i32<1>, ConstantFP:f32<1.000000e+00>, undef:f32, seteq:ch
t32: f32 = select_cc t19, Constant:i32<2>, ConstantFP:f32<1.000000e+00>, undef:f32, seteq:ch
t34: f32 = select_cc t19, Constant:i32<3>, ConstantFP:f32<1.000000e+00>, undef:f32, seteq:ch
t35: v4f32 = BUILD_VECTOR t28, t30, t32, t34
and then:
t40: i1 = setcc t19, Constant:i32<0>, seteq:ch
t41: f32 = select t40, ConstantFP:f32<1.000000e+00>, undef:f32
t42: i1 = setcc t19, Constant:i32<1>, seteq:ch
t43: f32 = select t42, ConstantFP:f32<1.000000e+00>, undef:f32
t44: i1 = setcc t19, Constant:i32<2>, seteq:ch
t45: f32 = select t44, ConstantFP:f32<1.000000e+00>, undef:f32
t46: i1 = setcc t19, Constant:i32<3>, seteq:ch
t47: f32 = select t46, ConstantFP:f32<1.000000e+00>, undef:f32
t35: v4f32 = BUILD_VECTOR t41, t43, t45, t47
So it looks like an appropriate optimization is:
select cc, x, undef => x
https://reviews.llvm.org/D54606
More information about the llvm-commits
mailing list