[PATCH] D136244: [AArch64] Optimize memcmp when the result is tested for [in]equality with 0

Fri Oct 21 10:35:17 PDT 2022

efriedma added inline comments.

================
Comment at: llvm/test/CodeGen/AArch64/i128-cmp.ll:122
 ; CHECK-NEXT:    orr x8, x9, x8
 ; CHECK-NEXT:    cbnz x8, .LBB10_2
 ; CHECK-NEXT:  // %bb.1: // %call
----------------
Allen wrote:
> efriedma wrote:
> > Is there some reason we don't want to combine this to cmp+ccmp+b.ne?
> * Thanks for your attention. This case is block by the constraint**N->use_begin()->getOpcode() != ISD::BRCOND**,  as I can't confirm that there is necessarily a benefit in this scenario. such as case **test_rmw_add_128 ** in file CodeGen/AArch64/atomicrmw-O0.ll. If we can ignore the regression of  O0, then I can relex this constraint ?
> ```
> SelectionDAG has 19 nodes:
>   t0: ch,glue = EntryToken
>             t2: i64,ch = CopyFromReg t0, Register:i64 %0
>             t6: i64,ch = CopyFromReg t0, Register:i64 %2
>           t26: i64 = xor t2, t6
>             t4: i64,ch = CopyFromReg t0, Register:i64 %1
>             t8: i64,ch = CopyFromReg t0, Register:i64 %3
>           t27: i64 = xor t4, t8
>         t28: i64 = or t26, t27
>       t22: i32 = setcc t28, Constant:i64<0>, setne:ch
>     t21: ch = brcond t0, t22, BasicBlock:ch<exit 0xaaaab28f7268>
>   t18: ch = br t21, BasicBlock:ch<call 0xaaaab28f7170>
> ```
> * This is the key change of case **test_rmw_add_128 **, which is compiled with -O0.
> ```
> -; NOLSE-NEXT:    eor x11, x9, x11
> -; NOLSE-NEXT:    eor x8, x10, x8
> -; NOLSE-NEXT:    orr x8, x8, x11
> +; NOLSE-NEXT:    mov x9, x8
>  ; NOLSE-NEXT:    str x9, [sp, #8] // 8-byte Folded Spill
> +; NOLSE-NEXT:    mov x10, x12
>  ; NOLSE-NEXT:    str x10, [sp, #16] // 8-byte Folded Spill
> +; NOLSE-NEXT:    subs x12, x12, x13
> +; NOLSE-NEXT:    ccmp x8, x11, #0, eq
> +; NOLSE-NEXT:    cset w8, eq
>  ; NOLSE-NEXT:    str x10, [sp, #32] // 8-byte Folded Spill
>  ; NOLSE-NEXT:    str x9, [sp, #40] // 8-byte Folded Spill
> -; NOLSE-NEXT:    cbnz x8, .LBB4_1
> +; NOLSE-NEXT:    tbnz w8, #0, .LBB4_1
> ```
We can mostly ignore codesize at -O0.  (I mean, it matters to the extent that really bloated code can start to impact compile-time, but that isn't relevant here.)

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136244/new/

https://reviews.llvm.org/D136244