[llvm] fold mov dec/inc to lea +- 1 (PR #185194)

Sun Mar 8 11:31:36 PDT 2026

================
@@ -0,0 +1,101 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+avx2,+slow-3ops-lea | FileCheck %s
+; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+avx2,-slow-3ops-lea | FileCheck %s
+
+define i64 @mov_dec(<32 x i8> %x) local_unnamed_addr {
+; CHECK-LABEL: mov_dec:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vpmovmskb %ymm0, %ecx
+; CHECK-NEXT:    leal -1(%rcx), %eax
+; CHECK-NEXT:    shlq $32, %rcx
+; CHECK-NEXT:    orq %rcx, %rax
+; CHECK-NEXT:    vzeroupper
+; CHECK-NEXT:    retq
+  %cmp = icmp slt <32 x i8> %x, zeroinitializer
+  %mvmsk = bitcast <32 x i1> %cmp to i32
----------------
Takashiidobe wrote:

I decided to turn these into MIR tests because you can just run the pass on a set up of mov + dec/inc and it's much clearer to see the instruction generation. 

These are both left as e2e tests because they're from the issue itself so they're the best regression tests. If you try to simplify the tests a bit (to force the mov dec pattern) they're already taken care of beforehand so they don't show up at all, so any ir -> asm test would not test this pass without the vpmovmskb in x86.

on main this test emits:

```asm
  vpmovmskb %ymm0, %ecx ; drop result in %ecx                                                                                         
  movl %ecx, %eax                   ; make a copy of %ecx in %eax
  shlq $32, %rcx                        ; clobber %ecx by mutating %rcx                                         
  incl %eax                                 ; increment %eax, the copy                         
  orq %rcx, %rax                       ; use both (so we need at least one copy)
```

so it's the easiest way to generate this pattern without having to do some tricks in the IR. The tests in `llvm/test/CodeGen/X86/pr44412.ll` also work (since they were also changed by this pass) but they're for loops which emit a lot more code to read and they're harder to set up so I figured just writing MIR tests would be better.

https://github.com/llvm/llvm-project/pull/185194