<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - InstCombine incorrectly optimizes bit mask operations"

   href="https://bugs.llvm.org/show_bug.cgi?id=51732">51732</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>InstCombine incorrectly optimizes bit mask operations

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>new-bugs

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>10.0

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Linux

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>new bugs

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>pfcittolin@gmail.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>htmldeveloper@gmail.com, llvm-bugs@lists.llvm.org

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Hello everyone,

I noticed a certain pattern of incorrectly optimizing bit mask operations on

integers, specifically at bits located at the end of "byte chunks" (7, 15,

31...).

For example, giving the following

define dso_local i32 @main() {

    %rlo.1 = alloca i1

    store i1 1, i1* %rlo.1

    %1 = load i64, i64* @byte ; @byte = dso_local global i64 30

    %2 = shl i64 1, 30

    %3 = and i64 %1, %2

    %4 = icmp ne i64 %3, 0

    %5 = load i1, i1* %rlo.1

    %6 = and i1 %4, %5

    store i1 %6, i1* %rlo.1

    %7 = load i1, i1* %rlo.1

    %8 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x

i8]* @.str.newline, i64 0, i64 0), i1 %7)

    ret i32 0

}

It optimizes to:

define dso_local i32 @main() local_unnamed_addr #0 {

  %1 = load i64, i64* @byte, align 8

  %2 = and i64 %1, 1073741824

  %3 = icmp ne i64 %2, 0

  %4 = tail call i32 (i8*, ...) @printf(i8* nonnull dereferenceable(1)

getelementptr inbounds ([4 x i8], [4 x i8]* @.str.newline, i64 0, i64 0), i1

%3)

  ret i32 0

}

However, when bitmasking bit 31, for example:

define dso_local i32 @main() {

    %rlo.1 = alloca i1

    store i1 1, i1* %rlo.1

    %1 = load i64, i64* @byte

    %2 = shl i64 1, 31

    %3 = and i64 %1, %2

    %4 = icmp ne i64 %3, 0

    %5 = load i1, i1* %rlo.1

    %6 = and i1 %4, %5

    store i1 %6, i1* %rlo.1

    %7 = load i1, i1* %rlo.1

    %8 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x

i8]* @.str.newline, i64 0, i64 0), i1 %7)

    ret i32 0

}

It gives me:

define dso_local i32 @main() local_unnamed_addr #0 {

  %1 = load i64, i64* @byte, align 8

  %2 = trunc i64 %1 to i32

  %3 = icmp slt i32 %2, 0

  %4 = tail call i32 (i8*, ...) @printf(i8* nonnull dereferenceable(1)

getelementptr inbounds ([4 x i8], [4 x i8]* @.str.newline, i64 0, i64 0), i1

%3)

  ret i32 0

}

This holds for any of those bits mentioned before:

%2 = shl i64 1, 7

%3 = and i64 %1, %2

%4 = icmp ne i64 %3, 0

Optimizes to:

%1 = load i64, i64* @byte, align 8

%2 = trunc i64 %1 to i8

%3 = icmp slt i8 %2, 0

And so on...

Using "-print-before-all -print-after-all" on "opt -O3" I narrowed it down to

the following part (here on an another equivalent example):

*** IR Dump Before Combine redundant instructions ***

define dso_local i32 @main() local_unnamed_addr {

  %1 = load i32, i32* @byte

  %2 = and i32 %1, 32768

  %3 = icmp ne i32 %2, 0

  %4 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x

i8]* @.str.newline, i64 0, i64 0), i1 %3)

  ret i32 0

}

*** IR Dump After Combine redundant instructions ***

define dso_local i32 @main() local_unnamed_addr {

  %1 = load i32, i32* @byte, align 4

  %2 = trunc i32 %1 to i16

  %3 = icmp slt i16 %2, 0

  %4 = call i32 (i8*, ...) @printf(i8* nonnull dereferenceable(1) getelementptr

inbounds ([4 x i8], [4 x i8]* @.str.newline, i64 0, i64 0), i1 %3)

  ret i32 0

}</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>