[PATCH] D102116: [LoopIdiom] 'logical right-shift until zero' ('count active bits') "on steroids" idiom recognition.

Han Zhu via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu May 13 13:02:09 PDT 2021


zhuhan0 accepted this revision.
zhuhan0 added a comment.
This revision is now accepted and ready to land.

LGTM.



================
Comment at: llvm/test/Transforms/LoopIdiom/X86/logical-right-shift-until-zero.ll:9-65
 ; Most basic pattern; Note that iff the shift amount is offset, said offsetting
 ; must not cause an overflow, but `add nsw` is fine.
 define i8 @p0(i8 %val, i8 %start, i8 %extraoffset) {
 ; CHECK-LABEL: @p0(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[VAL_NUMLEADINGZEROS:%.*]] = call i8 @llvm.ctlz.i8(i8 [[VAL:%.*]], i1 false)
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS:%.*]] = sub nuw nsw i8 8, [[VAL_NUMLEADINGZEROS]]
----------------
lebedev.ri wrote:
> lebedev.ri wrote:
> > zhuhan0 wrote:
> > > lebedev.ri wrote:
> > > > zhuhan0 wrote:
> > > > > I could be wrong but would this mis-compile if %nbits results in unsigned overflow? For example,
> > > > > ```
> > > > > %val = 0x10000000
> > > > > %start = 1
> > > > > %extraoffset.= 255
> > > > > ```
> > > > > Loop trip count is 8 before transformation but 1 after.
> > > > Could you please specify, for which bitwidth your counterexample is?
> > > > I'm going to guess i32, so we have
> > > > ```
> > > > %iv = i32 1
> > > > %nbits = i32 256
> > > > %val.shifted = lshr i32 %val, 256
> > > > ```
> > > > We then navigate to https://llvm.org/docs/LangRef.html#lshr-instruction
> > > > > If op2 is (statically or dynamically) equal to or larger than the number of bits in op1, this instruction returns a poison value.
> > > > 
> > > > So i'm not seeing a miscompile.
> > > > 
> > > > As i have said, i've verified each of the tests here with alive2, and they are all fine.
> > > > 
> > > 8 bits.
> > > ```
> > > %start = i8 1
> > > %extraoffset = i8 255  ; unsigned
> > > ```
> > > Sorry I'm not familiar with alive2 so forgive me if I'm questioning something that's obviously proven to be correct.
> > Then i do not understand what `%val = 0x10000000` means.
> > What's the decimal value of `%val`?
> Ah, you mean `%val = 128`.
> So we have
> ```
> $ cat /tmp/test.ll 
> declare void @escape_inner(i8, i8, i8, i1, i8)
> declare void @escape_outer(i8, i8, i8, i1, i8)
> 
> define i8 @p0() {
> entry:
>   br label %loop
> 
> loop:
>   %iv = phi i8 [ 1, %entry ], [ %iv.next, %loop ]
>   %nbits = add nsw i8 %iv, 255
>   %val.shifted = lshr i8 128, %nbits
>   %val.shifted.iszero = icmp eq i8 %val.shifted, 0
>   %iv.next = add i8 %iv, 1
> 
>   call void @escape_inner(i8 %iv, i8 %nbits, i8 %val.shifted, i1 %val.shifted.iszero, i8 %iv.next)
> 
>   br i1 %val.shifted.iszero, label %end, label %loop
> 
> end:
>   %iv.res = phi i8 [ %iv, %loop ]
>   %nbits.res = phi i8 [ %nbits, %loop ]
>   %val.shifted.res = phi i8 [ %val.shifted, %loop ]
>   %val.shifted.iszero.res = phi i1 [ %val.shifted.iszero, %loop ]
>   %iv.next.res = phi i8 [ %iv.next, %loop ]
> 
>   call void @escape_outer(i8 %iv.res, i8 %nbits.res, i8 %val.shifted.res, i1 %val.shifted.iszero.res, i8 %iv.next.res)
> 
>   ret i8 %iv.res
> }
> $ ./bin/opt -loop-idiom -mtriple=x86_64 -mcpu=core-avx2 -o - -S /tmp/test.ll | ./bin/opt -O3 -S -o - -
> ; ModuleID = '<stdin>'
> source_filename = "/tmp/test.ll"
> target triple = "x86_64"
> 
> declare void @escape_inner(i8, i8, i8, i1, i8) local_unnamed_addr #0
> 
> declare void @escape_outer(i8, i8, i8, i1, i8) local_unnamed_addr #0
> 
> define i8 @p0() local_unnamed_addr #0 {
> entry:
>   tail call void @escape_inner(i8 1, i8 0, i8 -128, i1 false, i8 2)
>   tail call void @escape_inner(i8 2, i8 1, i8 64, i1 false, i8 3)
>   tail call void @escape_inner(i8 3, i8 2, i8 32, i1 false, i8 4)
>   tail call void @escape_inner(i8 4, i8 3, i8 16, i1 false, i8 5)
>   tail call void @escape_inner(i8 5, i8 4, i8 8, i1 false, i8 6)
>   tail call void @escape_inner(i8 6, i8 5, i8 4, i1 false, i8 7)
>   tail call void @escape_inner(i8 7, i8 6, i8 2, i1 false, i8 8)
>   tail call void @escape_inner(i8 8, i8 7, i8 1, i1 false, i8 9)
>   tail call void @escape_inner(i8 9, i8 8, i8 undef, i1 true, i8 10)
>   tail call void @escape_outer(i8 9, i8 8, i8 undef, i1 true, i8 10)
>   ret i8 9
> }
> 
> attributes #0 = { "target-cpu"="core-avx2" }
> 
> ```
> 
> `@escape_inner()` is called 9 times, not 1.
Thanks for the test.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102116/new/

https://reviews.llvm.org/D102116



More information about the llvm-commits mailing list