[PATCH] D37236: [InstCombine] add insertelement + shuffle demanded element fold
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Aug 29 14:11:45 PDT 2017
spatel added a comment.
In https://reviews.llvm.org/D37236#855615, @craig.topper wrote:
> I agree there are definitely cases where your change is required. I was just confused how we got a multiple use that didn't exist in the original IR.
>
> My fix seems to be needed to get rid of %out012 in this test case
>
> define <4 x i32> @add_ps_002(<4 x i32> %a, <4 x i32> %b) {
> %a0 = extractelement <4 x i32> %a, i32 0
> %a1 = extractelement <4 x i32> %a, i32 1
> %a2 = extractelement <4 x i32> %a, i32 2
> %a3 = extractelement <4 x i32> %a, i32 3
> %a0_again = extractelement <4 x i32> %a, i32 0
> %a1_again = extractelement <4 x i32> %a, i32 1
> %a2_again = extractelement <4 x i32> %a, i32 2
> %a3_again = extractelement <4 x i32> %a, i32 3
> %add01 = add i32 %a0, %a1
> %add23 = add i32 %a2, %a3
> %add01_again = add i32 %a0_again, %a1_again
> %add23_again = add i32 %a2_again, %a3_again
> %out0 = insertelement <4 x i32> undef, i32 %add01, i32 0
> %out01 = insertelement <4 x i32> %out0, i32 %add23, i32 1
> %out012 = insertelement <4 x i32> %out01, i32 %add01_again, i32 2
> %foo = add <4 x i32> %out012, %b
> %out0123 = insertelement <4 x i32> %foo, i32 %add23_again, i32 3
> %shuffle = shufflevector <4 x i32> %out0123, <4 x i32> %a, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
> ret <4 x i32> %shuffle
> }
>
Yep - nice example. So we need both of these changes and might as well do the early return too.
Do you think that covers all cases?
Since I've demystified for myself how to write an opt pass, I'm wondering if we should just move SimplifyDemandedVectorElts() into its own pass similar to BDCE. I think it would run twice (once near the start and then again after loopvectorizer/SLP). And then instcombine would have one less task in its job description.
https://reviews.llvm.org/D37236
More information about the llvm-commits
mailing list