[llvm] [CGP] Despeculate ctlz/cttz with "illegal" integer types (PR #137197)

Sergei Barannikov via llvm-commits llvm-commits at lists.llvm.org
Tue Apr 29 10:47:24 PDT 2025


================
@@ -441,33 +444,35 @@ define i64 @test_cttz_i64(i64 %a) nounwind {
 ;
 ; RV32M-LABEL: test_cttz_i64:
 ; RV32M:       # %bb.0:
+; RV32M-NEXT:    or a2, a0, a1
+; RV32M-NEXT:    beqz a2, .LBB3_3
+; RV32M-NEXT:  # %bb.1: # %cond.false
 ; RV32M-NEXT:    lui a2, 30667
 ; RV32M-NEXT:    addi a3, a2, 1329
 ; RV32M-NEXT:    lui a2, %hi(.LCPI3_0)
 ; RV32M-NEXT:    addi a2, a2, %lo(.LCPI3_0)
-; RV32M-NEXT:    bnez a1, .LBB3_3
-; RV32M-NEXT:  # %bb.1:
-; RV32M-NEXT:    li a1, 32
-; RV32M-NEXT:    beqz a0, .LBB3_4
-; RV32M-NEXT:  .LBB3_2:
-; RV32M-NEXT:    neg a1, a0
-; RV32M-NEXT:    and a0, a0, a1
+; RV32M-NEXT:    bnez a0, .LBB3_4
+; RV32M-NEXT:  # %bb.2: # %cond.false
+; RV32M-NEXT:    neg a0, a1
+; RV32M-NEXT:    and a0, a1, a0
 ; RV32M-NEXT:    mul a0, a0, a3
 ; RV32M-NEXT:    srli a0, a0, 27
 ; RV32M-NEXT:    add a0, a2, a0
 ; RV32M-NEXT:    lbu a0, 0(a0)
+; RV32M-NEXT:    addi a0, a0, 32
 ; RV32M-NEXT:    li a1, 0
 ; RV32M-NEXT:    ret
 ; RV32M-NEXT:  .LBB3_3:
-; RV32M-NEXT:    neg a4, a1
-; RV32M-NEXT:    and a1, a1, a4
-; RV32M-NEXT:    mul a1, a1, a3
-; RV32M-NEXT:    srli a1, a1, 27
-; RV32M-NEXT:    add a1, a2, a1
-; RV32M-NEXT:    lbu a1, 0(a1)
-; RV32M-NEXT:    bnez a0, .LBB3_2
+; RV32M-NEXT:    li a1, 0
----------------
s-barannikov wrote:

This is something that could be handled by RISCVRedundantCopyElimination (after some improvements like supporting AND/OR), but the context for the optimization is created later, by TailDuplicatePass. (It is +14 passes later.)

```
bb.0 (%ir-block.0):
  successors: %bb.1(0x30000000), %bb.3(0x50000000); %bb.1(37.50%), %bb.3(62.50%)
  liveins: $x10, $x11
  renamable $x12 = OR renamable $x10, renamable $x11
  BNE killed renamable $x12, $x0, %bb.3

bb.1:
; predecessors: %bb.0

  renamable $x11 = COPY $x0
  renamable $x10 = ADDI $x0, 64
  PseudoRET implicit $x10, implicit $x11
```

I think the extra `li` may not be a big problem here as the result of `cttz` is usually truncated to 32 bits.


https://github.com/llvm/llvm-project/pull/137197


More information about the llvm-commits mailing list