[llvm-bugs] [Bug 42746] New: Should CorrelatedValuePropagation pass reduce width of shifts?

Wed Jul 24 11:27:10 PDT 2019

https://bugs.llvm.org/show_bug.cgi?id=42746

            Bug ID: 42746
           Summary: Should CorrelatedValuePropagation pass reduce width of
                    shifts?
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Scalar Optimizations
          Assignee: unassignedbugs at nondot.org
          Reporter: lebedev.ri at gmail.com
                CC: llvm-bugs at lists.llvm.org

I'm currently looking into re-fixing
https://bugs.llvm.org/show_bug.cgi?id=42399,
and i'm currently stuck with a pattern like:

define i1 @test(i64 %storage, i32 %nbits) {
  %skipnbits = sub nsw i32 64, %nbits
  %skipnbitswide = zext i32 %skipnbits to i64
  %datawide = lshr i64 %storage, %skipnbitswide
  %data = trunc i64 %datawide to i32
  %nbitsminusone = add nsw i32 %nbits, -1
  %bitmask = shl i32 1, %nbitsminusone
  %bitmasked = and i32 %bitmask, %data
  %isbitunset = icmp eq i32 %bitmasked, 0
  ret i1 %isbitunset
}

While the desired optimized result is:
define i1 @test(i64 %storage, i32 %nbits) {
  %tmp = icmp sgt i64 %storage, -1
  ret i1 %tmp
}

That transform is indeed correct:
Name: PR42399
  %skipnbits = sub nsw i32 64, %nbits
  %skipnbitswide = zext i32 %skipnbits to i64
  %datawide = lshr i64 %storage, %skipnbitswide
  %data = trunc i64 %datawide to i32
  %nbitsminusone = add nsw i32 %nbits, -1
  %bitmask = shl i32 1, %nbitsminusone
  %bitmasked = and i32 %bitmask, %data
  %isbitunset = icmp eq i32 %bitmasked, 0
=>
  %isbitunset = icmp sgt i64 %storage, -1

https://rise4fun.com/Alive/hUu

The problem is those truncations around shifts.
The current legality check i've come up with is:
https://rise4fun.com/Alive/M5vF

Name: one truncation 0 - the original widest input should be losslessly
truncatable, or the other input should be '1'
Pre: C1+C2 u< 64 && ( (countLeadingZeros(C11) u>= (64-32)) ||
(countLeadingZeros(C22) u>= (32-1)) )
  %C1_64 = zext i8 C1 to i64
  %C2_32 = zext i8 C2 to i32
  %old_shift_of_x = lshr i64 C11, %C1_64
  %old_shift_of_y = shl i32 C22, %C2_32
  %old_trunc_of_shift_of_x = trunc i64 %old_shift_of_x to i32
  %old_masked = and i32 %old_trunc_of_shift_of_x, %old_shift_of_y
  %r = icmp ne i32 %old_masked, 0
=>
  %C1_64 = zext i8 C1 to i64
  %C2_64 = zext i8 C2 to i64
  %new_shamt = add i64 %C1_64, %C2_64
  %new_y_wide = zext i32 C22 to i64
  %new_shift = shl i64 %new_y_wide, %new_shamt  
  %new_masked = and i64 %new_shift, C11
  %r = icmp ne i64 %new_masked, 0

I.e. the transform can be done if the truncation could have been threaded over
the shift
in the first place.
But we don't seem to do that currently, and we don't do the opposite transform:
https://godbolt.org/z/qYlMJP

In CorrelatedValuePropagation.cpp i only see reduction of udiv/urem width.
Should it be taught to also reduce width of shifts?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20190724/a11b50bb/attachment.html>