<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/142497>142497</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            missed optimization for ceiling division with known ranges
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          alex
      </td>
    </tr>
</table>

<pre>
    My starting point was the following Rust code (meaning is hopefully clear even to non-rust speakers):
```
pub fn src(v: u32) -> u32 {
    (u32::BITS - v.leading_zeros()).div_ceil(8)
}
```

rustc emits the following LLVM IR:

```
define noundef range(i32 0, 6) i32 @src(i32 noundef %v) unnamed_addr #0 {
start:
  %0 = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 %v, i1 false)
  %_2 = sub nuw nsw i32 32, %0
  %_41 = lshr i32 %_2, 3
  %_5 = and i32 %_2, 7
  %_6.not = icmp ne i32 %_5, 0
  %1 = zext i1 %_6.not to i32
  %_0.sroa.0.0 = add nuw nsw i32 %_41, %1
  ret i32 %_0.sroa.0.0
}
```

Which emits the following x86:

```
src: # @src
        mov     ecx, 63
        bsr ecx, edi
        xor     ecx, -32
        add     ecx, 33
        mov eax, ecx
        shr     eax, 3
        and     ecx, 7
        cmp ecx, 1
        sbb     eax, -1
        ret
```

however, this could be validly optimized to the following LLVM-IR:

```
define noundef range(i32 0, 5) i32 @tgt(i32 noundef %v) unnamed_addr #0 {
start:
  %0 = tail call i32 @llvm.ctlz.i32(i32 %v, i1 false)
  %1 = sub nuw nsw i32 32, %0
  %2 = add nuw nsw i32 %1, 7
  %3 = lshr i32 %2, 3
  ret i32 %3
}
```

which produces the following, much tighter x86:

```
tgt: # @tgt
        mov     eax, 63
        bsr     eax, edi
 xor     eax, -32
        add     eax, 40
        shr     eax, 3
 ret
```

alive2 showing that the transformation is valid: https://alive2.llvm.org/ce/z/Ys4qAy

(As a bit of interest, I found the optimized versions using claude. Computers are wild: https://claude.ai/share/d998511d-45ee-4132-bee4-fe7f70350a67)


</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJysVl9vo7gX_TTOy1UiY0MgDzwwnV-kkWZe5rfa1T5VBl-Cd4ydtQ1p--lXBprSNt1djRZVovie-_-cgPBenQxiSbJPJPu8EUPorCuFxodNbeVj-e0RfBAuKHOCs1UmwEV4CB1Ca7W2l3j-ffABGisRCCt6FCYeKg-dPWM7aP0IjUbhAEc0ECwYa7Yu-vgzih_oPGEHwitCK7Knyx-tzkMNrQHvGsKKkfAKBs4IO8CW8P_F_4HknwitACDmjUZeEV59-vLL_2EL406jkMqc7p_QWU9YEbOww06q8b5BpQkrpiNakfzzm9yEVrHABrBX4W27X7_--g2-fF8qfu0osVUGwdjBSGzBCXNCwgrFGVDC7mAfG4hPJKVzZ_HhGU5YNkbAYIzoUd4LKR0QxunS6rSJOW_sOaNA-GcIQmlohNbv03G-yqf12O-aoJ92Kg5yQs0Z70Al0ArtcR7IFPyeTdH9UIMZLmD8ZQoUXe-m5FdgmkxI7TsHS9D7CcWvkGxCCCNfA_IrYL8zNkwg1fRnMHgFZhF4TTanesKHEGt-8Qw2OlzD0Z13Vuzobh6RkPJVE3PVSyPJ5OUwXE0v3h-x47dONd1NdjwU-5vMiNvmVdzm8-5n6sart-N0x-ZhIglf2Wrvns9RqpXhwbq103bpfr5iwysj52-yoZgjNg8rQ9zf5DQb1z5xc6t4-coU17UcJ-tgdb0Otl3bHIb3I-3sBUd0ERw65aGxg5ZQI4xCK6kfwZ6D6tUTyrjs95Lc_qQks5VEwin8x5L8Oe0l_1J67CN2J6_Uxd_pcyXPFfH5R3S_THQ_OyuHBt8wPkbqh6aDoE5dQPehAuJwXxQQn24oQNxWwMq2qOBKf_F39J-NKf0Hlt8kpNBqRAa-mzkWOhGmzoMTxrfW9SIoa-KLbmJo7K0L4exj8-xI2HEOsJt2b92JsGODhB2fCDv-7tM_q8dlRqyoPAioVQDbgjIBHfoQi_sCbeThlPaF_iM6r6zxMPhYWKPFIHEHd7Y_DwGdB-EQLkrfKGnBCkXY0XfCxXrk4VBkSSK3aYa4TRPOtjVium0xb3PKMyr2-fKipNVGllwe-EFssEzytDjQdJ-xTVdme5m0TZ7tWcoSlreHIsemFTwvkoIfcr5RJaMso3vKGMtoluwShg2tUaCsE5rsC5JS7IXS14FtlPcDlknK0kO-0aJG7acPFcYMXmCyEsbid4sro9O2Hk4-qk354F_CBBU0lr3yHuXzGOfdtdZB_BaIY5RqVHGscFGhgx_GXsz8Y-E3g9Pl6zmeVOiGetfYnrBjTLTctmdn_8AmEHacyvOEHZf6x5L9FQAA__-gcqbn">