[PATCH] D107262: [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block

Dave Green via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Aug 6 00:56:31 PDT 2021


dmgreen added a comment.

Thanks, looking good. But I do still worry about the order of instructions sunk.

I was trying it out, seeing if it would go wrong when we were sinking a lot of operands. I noticed that the add/sub sinking wasn't really working properly though! There is https://reviews.llvm.org/D107623 to improve that and getting the shuffles to sink.

With that in, can you add these two test to show partially sinking two values at the same time:

  define <4 x i32> @sinkadd_partial(<8 x i16> %a1, <8 x i16> %a2, i8 %f) {
  for.cond4.preheader.lr.ph:
    %cmp = icmp slt i8 %f, 0
    %s2 = shufflevector <8 x i16> %a2, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
    %s1 = shufflevector <8 x i16> %a1, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
    br i1 %cmp, label %for.cond4.preheader.us.preheader, label %for.cond4.preheader.preheader
  
  for.cond4.preheader.us.preheader:                 ; preds = %for.cond4.preheader.lr.ph
    %e1 = sext <4 x i16> %s1 to <4 x i32>
    %e2 = sext <4 x i16> %s2 to <4 x i32>
    %0 = add <4 x i32> %e1, %e2
    ret <4 x i32> %0
  
  for.cond4.preheader.preheader:                    ; preds = %for.cond4.preheader.lr.ph
    ret <4 x i32> zeroinitializer
  }
  
  define <4 x i32> @sinkadd_partial_rev(<8 x i16> %a1, <8 x i16> %a2, i8 %f) {
  for.cond4.preheader.lr.ph:
    %cmp = icmp slt i8 %f, 0
    %s2 = shufflevector <8 x i16> %a2, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
    %s1 = shufflevector <8 x i16> %a1, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
    br i1 %cmp, label %for.cond4.preheader.us.preheader, label %for.cond4.preheader.preheader
  
  for.cond4.preheader.us.preheader:                 ; preds = %for.cond4.preheader.lr.ph
    %e2 = sext <4 x i16> %s2 to <4 x i32>
    %e1 = sext <4 x i16> %s1 to <4 x i32>
    %0 = add <4 x i32> %e1, %e2
    ret <4 x i32> %0
  
  for.cond4.preheader.preheader:                    ; preds = %for.cond4.preheader.lr.ph
    ret <4 x i32> zeroinitializer
  }

The order of extends in the target block become important.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107262/new/

https://reviews.llvm.org/D107262



More information about the llvm-commits mailing list