[llvm] [X86] Handle BSF/BSR "zero-input pass through" behaviour (PR #123623)

Thu Jan 23 02:50:40 PST 2025

================
@@ -227,9 +227,8 @@ define i64 @PR89533(<64 x i8> %a0) {
 ; SSE-NEXT:    orl %eax, %edx
 ; SSE-NEXT:    shlq $32, %rdx
 ; SSE-NEXT:    orq %rcx, %rdx
-; SSE-NEXT:    bsfq %rdx, %rcx
 ; SSE-NEXT:    movl $64, %eax
-; SSE-NEXT:    cmovneq %rcx, %rax
+; SSE-NEXT:    rep bsfq %rdx, %rax
----------------
RKSimon wrote:

We use "REP BSF" so that it can be recognized as TZCNT on BMI capable machines, which is a lot quicker than BSF - it uses this pattern as we no longer have any EFLAGS dependency. In that case we just pay the trivial penalty of the extra MOV.

https://github.com/llvm/llvm-project/pull/123623