[PATCH] D102116: [LoopIdiom] 'logical right-shift until zero' ('count active bits') "on steroids" idiom recognition.
Han Zhu via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu May 13 13:02:09 PDT 2021
zhuhan0 accepted this revision.
zhuhan0 added a comment.
This revision is now accepted and ready to land.
LGTM.
================
Comment at: llvm/test/Transforms/LoopIdiom/X86/logical-right-shift-until-zero.ll:9-65
; Most basic pattern; Note that iff the shift amount is offset, said offsetting
; must not cause an overflow, but `add nsw` is fine.
define i8 @p0(i8 %val, i8 %start, i8 %extraoffset) {
; CHECK-LABEL: @p0(
; CHECK-NEXT: entry:
+; CHECK-NEXT: [[VAL_NUMLEADINGZEROS:%.*]] = call i8 @llvm.ctlz.i8(i8 [[VAL:%.*]], i1 false)
+; CHECK-NEXT: [[VAL_NUMACTIVEBITS:%.*]] = sub nuw nsw i8 8, [[VAL_NUMLEADINGZEROS]]
----------------
lebedev.ri wrote:
> lebedev.ri wrote:
> > zhuhan0 wrote:
> > > lebedev.ri wrote:
> > > > zhuhan0 wrote:
> > > > > I could be wrong but would this mis-compile if %nbits results in unsigned overflow? For example,
> > > > > ```
> > > > > %val = 0x10000000
> > > > > %start = 1
> > > > > %extraoffset.= 255
> > > > > ```
> > > > > Loop trip count is 8 before transformation but 1 after.
> > > > Could you please specify, for which bitwidth your counterexample is?
> > > > I'm going to guess i32, so we have
> > > > ```
> > > > %iv = i32 1
> > > > %nbits = i32 256
> > > > %val.shifted = lshr i32 %val, 256
> > > > ```
> > > > We then navigate to https://llvm.org/docs/LangRef.html#lshr-instruction
> > > > > If op2 is (statically or dynamically) equal to or larger than the number of bits in op1, this instruction returns a poison value.
> > > >
> > > > So i'm not seeing a miscompile.
> > > >
> > > > As i have said, i've verified each of the tests here with alive2, and they are all fine.
> > > >
> > > 8 bits.
> > > ```
> > > %start = i8 1
> > > %extraoffset = i8 255 ; unsigned
> > > ```
> > > Sorry I'm not familiar with alive2 so forgive me if I'm questioning something that's obviously proven to be correct.
> > Then i do not understand what `%val = 0x10000000` means.
> > What's the decimal value of `%val`?
> Ah, you mean `%val = 128`.
> So we have
> ```
> $ cat /tmp/test.ll
> declare void @escape_inner(i8, i8, i8, i1, i8)
> declare void @escape_outer(i8, i8, i8, i1, i8)
>
> define i8 @p0() {
> entry:
> br label %loop
>
> loop:
> %iv = phi i8 [ 1, %entry ], [ %iv.next, %loop ]
> %nbits = add nsw i8 %iv, 255
> %val.shifted = lshr i8 128, %nbits
> %val.shifted.iszero = icmp eq i8 %val.shifted, 0
> %iv.next = add i8 %iv, 1
>
> call void @escape_inner(i8 %iv, i8 %nbits, i8 %val.shifted, i1 %val.shifted.iszero, i8 %iv.next)
>
> br i1 %val.shifted.iszero, label %end, label %loop
>
> end:
> %iv.res = phi i8 [ %iv, %loop ]
> %nbits.res = phi i8 [ %nbits, %loop ]
> %val.shifted.res = phi i8 [ %val.shifted, %loop ]
> %val.shifted.iszero.res = phi i1 [ %val.shifted.iszero, %loop ]
> %iv.next.res = phi i8 [ %iv.next, %loop ]
>
> call void @escape_outer(i8 %iv.res, i8 %nbits.res, i8 %val.shifted.res, i1 %val.shifted.iszero.res, i8 %iv.next.res)
>
> ret i8 %iv.res
> }
> $ ./bin/opt -loop-idiom -mtriple=x86_64 -mcpu=core-avx2 -o - -S /tmp/test.ll | ./bin/opt -O3 -S -o - -
> ; ModuleID = '<stdin>'
> source_filename = "/tmp/test.ll"
> target triple = "x86_64"
>
> declare void @escape_inner(i8, i8, i8, i1, i8) local_unnamed_addr #0
>
> declare void @escape_outer(i8, i8, i8, i1, i8) local_unnamed_addr #0
>
> define i8 @p0() local_unnamed_addr #0 {
> entry:
> tail call void @escape_inner(i8 1, i8 0, i8 -128, i1 false, i8 2)
> tail call void @escape_inner(i8 2, i8 1, i8 64, i1 false, i8 3)
> tail call void @escape_inner(i8 3, i8 2, i8 32, i1 false, i8 4)
> tail call void @escape_inner(i8 4, i8 3, i8 16, i1 false, i8 5)
> tail call void @escape_inner(i8 5, i8 4, i8 8, i1 false, i8 6)
> tail call void @escape_inner(i8 6, i8 5, i8 4, i1 false, i8 7)
> tail call void @escape_inner(i8 7, i8 6, i8 2, i1 false, i8 8)
> tail call void @escape_inner(i8 8, i8 7, i8 1, i1 false, i8 9)
> tail call void @escape_inner(i8 9, i8 8, i8 undef, i1 true, i8 10)
> tail call void @escape_outer(i8 9, i8 8, i8 undef, i1 true, i8 10)
> ret i8 9
> }
>
> attributes #0 = { "target-cpu"="core-avx2" }
>
> ```
>
> `@escape_inner()` is called 9 times, not 1.
Thanks for the test.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D102116/new/
https://reviews.llvm.org/D102116
More information about the llvm-commits
mailing list