[PATCH] D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block
Tiehu Zhang via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Aug 9 01:14:15 PDT 2021
TiehuZhang added a comment.
In D107262#2930517 <https://reviews.llvm.org/D107262#2930517>, @dmgreen wrote:
> Thanks, looking good. But I do still worry about the order of instructions sunk.
>
> I was trying it out, seeing if it would go wrong when we were sinking a lot of operands. I noticed that the add/sub sinking wasn't really working properly though! There is https://reviews.llvm.org/D107623 to improve that and getting the shuffles to sink.
>
> With that in, can you add these two test to show partially sinking two values at the same time:
>
> define <4 x i32> @sinkadd_partial(<8 x i16> %a1, <8 x i16> %a2, i8 %f) {
> for.cond4.preheader.lr.ph:
> %cmp = icmp slt i8 %f, 0
> %s2 = shufflevector <8 x i16> %a2, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
> %s1 = shufflevector <8 x i16> %a1, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
> br i1 %cmp, label %for.cond4.preheader.us.preheader, label %for.cond4.preheader.preheader
>
> for.cond4.preheader.us.preheader: ; preds = %for.cond4.preheader.lr.ph
> %e1 = sext <4 x i16> %s1 to <4 x i32>
> %e2 = sext <4 x i16> %s2 to <4 x i32>
> %0 = add <4 x i32> %e1, %e2
> ret <4 x i32> %0
>
> for.cond4.preheader.preheader: ; preds = %for.cond4.preheader.lr.ph
> ret <4 x i32> zeroinitializer
> }
>
> define <4 x i32> @sinkadd_partial_rev(<8 x i16> %a1, <8 x i16> %a2, i8 %f) {
> for.cond4.preheader.lr.ph:
> %cmp = icmp slt i8 %f, 0
> %s2 = shufflevector <8 x i16> %a2, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
> %s1 = shufflevector <8 x i16> %a1, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
> br i1 %cmp, label %for.cond4.preheader.us.preheader, label %for.cond4.preheader.preheader
>
> for.cond4.preheader.us.preheader: ; preds = %for.cond4.preheader.lr.ph
> %e2 = sext <4 x i16> %s2 to <4 x i32>
> %e1 = sext <4 x i16> %s1 to <4 x i32>
> %0 = add <4 x i32> %e1, %e2
> ret <4 x i32> %0
>
> for.cond4.preheader.preheader: ; preds = %for.cond4.preheader.lr.ph
> ret <4 x i32> zeroinitializer
> }
>
> The order of extends in the target block become important.
OK, I will, thanks!
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D107262/new/
https://reviews.llvm.org/D107262
More information about the llvm-commits
mailing list