[PATCH] D136244: [AArch64] Optimize memcmp when the result is tested for [in]equality with 0

Fri Oct 21 17:33:23 PDT 2022

Allen added inline comments.

================
Comment at: llvm/test/CodeGen/AArch64/i128-cmp.ll:122
 ; CHECK-NEXT:    orr x8, x9, x8
 ; CHECK-NEXT:    cbnz x8, .LBB10_2
 ; CHECK-NEXT:  // %bb.1: // %call
----------------
efriedma wrote:
> Allen wrote:
> > efriedma wrote:
> > > Is there some reason we don't want to combine this to cmp+ccmp+b.ne?
> > * Thanks for your attention. This case is block by the constraint**N->use_begin()->getOpcode() != ISD::BRCOND**,  as I can't confirm that there is necessarily a benefit in this scenario. such as case **test_rmw_add_128 ** in file CodeGen/AArch64/atomicrmw-O0.ll. If we can ignore the regression of  O0, then I can relex this constraint ?
> > ```
> > SelectionDAG has 19 nodes:
> >   t0: ch,glue = EntryToken
> >             t2: i64,ch = CopyFromReg t0, Register:i64 %0
> >             t6: i64,ch = CopyFromReg t0, Register:i64 %2
> >           t26: i64 = xor t2, t6
> >             t4: i64,ch = CopyFromReg t0, Register:i64 %1
> >             t8: i64,ch = CopyFromReg t0, Register:i64 %3
> >           t27: i64 = xor t4, t8
> >         t28: i64 = or t26, t27
> >       t22: i32 = setcc t28, Constant:i64<0>, setne:ch
> >     t21: ch = brcond t0, t22, BasicBlock:ch<exit 0xaaaab28f7268>
> >   t18: ch = br t21, BasicBlock:ch<call 0xaaaab28f7170>
> > ```
> > * This is the key change of case **test_rmw_add_128 **, which is compiled with -O0.
> > ```
> > -; NOLSE-NEXT:    eor x11, x9, x11
> > -; NOLSE-NEXT:    eor x8, x10, x8
> > -; NOLSE-NEXT:    orr x8, x8, x11
> > +; NOLSE-NEXT:    mov x9, x8
> >  ; NOLSE-NEXT:    str x9, [sp, #8] // 8-byte Folded Spill
> > +; NOLSE-NEXT:    mov x10, x12
> >  ; NOLSE-NEXT:    str x10, [sp, #16] // 8-byte Folded Spill
> > +; NOLSE-NEXT:    subs x12, x12, x13
> > +; NOLSE-NEXT:    ccmp x8, x11, #0, eq
> > +; NOLSE-NEXT:    cset w8, eq
> >  ; NOLSE-NEXT:    str x10, [sp, #32] // 8-byte Folded Spill
> >  ; NOLSE-NEXT:    str x9, [sp, #40] // 8-byte Folded Spill
> > -; NOLSE-NEXT:    cbnz x8, .LBB4_1
> > +; NOLSE-NEXT:    tbnz w8, #0, .LBB4_1
> > ```
> We can mostly ignore codesize at -O0.  (I mean, it matters to the extent that really bloated code can start to impact compile-time, but that isn't relevant here.)
Done, Thank you for your guidance.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136244/new/

https://reviews.llvm.org/D136244