[llvm] [CGP] Despeculate ctlz/cttz with "illegal" integer types (PR #137197)

Thu Apr 24 22:05:59 PDT 2025

================
@@ -285,30 +285,35 @@ define i32 @ctlo_i32_undef(i32 %x) {
   ret i32 %tmp2
 }
 
-define i64 @ctlo_i64(i64 %x) {
+define i64 @ctlo_i64(i64 %x) nounwind {
 ; X86-NOCMOV-LABEL: ctlo_i64:
 ; X86-NOCMOV:       # %bb.0:
+; X86-NOCMOV-NEXT:    pushl %esi
----------------
s-barannikov wrote:

RHS looks bigger because of additional tail duplication (and a spill).
In the case of "zero input" it is 3 instructions less, and on other code paths it is one less or the same (not counting the spill). It also avoids one high-latency(?) `bsr` on all paths.
The only disadvantage I see is that it uses an extra register, but that may not be a big deal when this is inlined into a larger function.

If that doesn't sound convincing, I can play with the heuristic (isCheapToSpeculateCtlz) to restore the behavior on 32-bit platform and >= 64-bit operand. Just let me know.


https://github.com/llvm/llvm-project/pull/137197