[PATCH] D135948: [LoongArch] Add codegen support for cmpxchg on LA64

Tue Oct 18 05:01:53 PDT 2022

gonglingqin added inline comments.

================
Comment at: llvm/test/CodeGen/LoongArch/ir-instruction/atomic-cmpxchg.ll:20
+; LA64-NEXT:  .LBB0_1: # =>This Inner Loop Header: Depth=1
+; LA64-NEXT:    dbar 0
+; LA64-NEXT:    ll.w $a4, $a3, 0
----------------
hev wrote:
> xry111 wrote:
> > xry111 wrote:
> > > hev wrote:
> > > > Not needed.
> > > > 
> > > > ll: full-membar + load-exclusive
> > > Jiaxun told me (via linux-mips):
> > > 
> > > > I had checked with Loongson guys and they confirmed that the workaround still needs to be applied to latest 3A4000
> > > > processors, including 3A4000 for MIPS and 3A5000 for LoongArch. Though, the reason behind the workaround varies with the evaluation
> > > > of  their uArch, for GS464V based core, barrier is required as the uArch design allows regular load to be reordered after an atomic linked
> > > > load, and that would break  assumption of compiler atomic constraints.
> > > 
> > > In GCC we use `dbar 0x700`, so in the future HW engineers can fix this issue and make `dbar 0x700` no-op.
> > Ouch, I mean the `dbar 1792` instruction at `LBB0_3`.  Yes this one can be removed for 3A5000.
> > 
> > But what should we do if `LLDBAR` bit is 0 in CPUCFG?
> Nice question.
> 
> What case we need memory barrier before atomic-op? Atomic-op with store-release semantics? Why not make sc as membar + store-conditional?
> 
> If memory barrier semantics of ll are still membar + load-exclusive in future, and the atomic-op with load-acquire semantics, I think we can't make dbar 0x700 as no-op.
@hev, @xry111, Thank you for your input, I will remove this dbar. In addition, thanks to @hev's suggestion, dbar is added before sc to ensure correctness when LLDBAR bit is 0.

================
Comment at: llvm/test/CodeGen/LoongArch/ir-instruction/atomic-cmpxchg.ll:25-27
+; LA64-NEXT:    xor $a5, $a4, $a2
+; LA64-NEXT:    and $a5, $a5, $a0
+; LA64-NEXT:    xor $a5, $a4, $a5
----------------
hev wrote:
> I think we should reduce the number of instructions between ll and sc to make ll/sc complete as fast as possible.
> 
> for refer: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/arch/loongarch/include/asm/cmpxchg.h?h=next-20221014#n114
Thanks, I will modify it.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135948/new/

https://reviews.llvm.org/D135948