[PATCH] D136244: [AArch64] Optimize memcmp when the result is tested for [in]equality with 0
Allen zhong via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Oct 21 17:33:23 PDT 2022
Allen added inline comments.
================
Comment at: llvm/test/CodeGen/AArch64/i128-cmp.ll:122
; CHECK-NEXT: orr x8, x9, x8
; CHECK-NEXT: cbnz x8, .LBB10_2
; CHECK-NEXT: // %bb.1: // %call
----------------
efriedma wrote:
> Allen wrote:
> > efriedma wrote:
> > > Is there some reason we don't want to combine this to cmp+ccmp+b.ne?
> > * Thanks for your attention. This case is block by the constraint**N->use_begin()->getOpcode() != ISD::BRCOND**, as I can't confirm that there is necessarily a benefit in this scenario. such as case **test_rmw_add_128 ** in file CodeGen/AArch64/atomicrmw-O0.ll. If we can ignore the regression of O0, then I can relex this constraint ?
> > ```
> > SelectionDAG has 19 nodes:
> > t0: ch,glue = EntryToken
> > t2: i64,ch = CopyFromReg t0, Register:i64 %0
> > t6: i64,ch = CopyFromReg t0, Register:i64 %2
> > t26: i64 = xor t2, t6
> > t4: i64,ch = CopyFromReg t0, Register:i64 %1
> > t8: i64,ch = CopyFromReg t0, Register:i64 %3
> > t27: i64 = xor t4, t8
> > t28: i64 = or t26, t27
> > t22: i32 = setcc t28, Constant:i64<0>, setne:ch
> > t21: ch = brcond t0, t22, BasicBlock:ch<exit 0xaaaab28f7268>
> > t18: ch = br t21, BasicBlock:ch<call 0xaaaab28f7170>
> > ```
> > * This is the key change of case **test_rmw_add_128 **, which is compiled with -O0.
> > ```
> > -; NOLSE-NEXT: eor x11, x9, x11
> > -; NOLSE-NEXT: eor x8, x10, x8
> > -; NOLSE-NEXT: orr x8, x8, x11
> > +; NOLSE-NEXT: mov x9, x8
> > ; NOLSE-NEXT: str x9, [sp, #8] // 8-byte Folded Spill
> > +; NOLSE-NEXT: mov x10, x12
> > ; NOLSE-NEXT: str x10, [sp, #16] // 8-byte Folded Spill
> > +; NOLSE-NEXT: subs x12, x12, x13
> > +; NOLSE-NEXT: ccmp x8, x11, #0, eq
> > +; NOLSE-NEXT: cset w8, eq
> > ; NOLSE-NEXT: str x10, [sp, #32] // 8-byte Folded Spill
> > ; NOLSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill
> > -; NOLSE-NEXT: cbnz x8, .LBB4_1
> > +; NOLSE-NEXT: tbnz w8, #0, .LBB4_1
> > ```
> We can mostly ignore codesize at -O0. (I mean, it matters to the extent that really bloated code can start to impact compile-time, but that isn't relevant here.)
Done, Thank you for your guidance.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D136244/new/
https://reviews.llvm.org/D136244
More information about the llvm-commits
mailing list