[clang] [clang-tools-extra] [llvm] Add out-of-line-atomics support to GlobalISel (PR #74588)

Mon Dec 18 06:20:31 PST 2023

RoboTux wrote:

> > > Not an expert on atomics, but why would we have a libcall for -O0 but not for O1 in the tests?
> > 
> > 
> > I looked at it for the u?(min|max) and it seemed to boil down to the atomic expand pass being run at -O1 and above.
> 
> No sorry, it's not that it's only run at O1 and above, it's that the output is different. At O0 it keeps the cmpxchg whereas at O1 it changes the cmpxchg into a ldxr + stlxr intrinsics.

@aemerson 
AArch64TargetLowering::shouldExpandAtomicRMWInIR() has:
  // Nand is not supported in LSE.
  // Leave 128 bits to LLSC or CmpXChg.
  if (AI->getOperation() != AtomicRMWInst::Nand && Size < 128) {
    if (Subtarget->hasLSE())
      return AtomicExpansionKind::None;
    if (Subtarget->outlineAtomics()) {
      // [U]Min/[U]Max RWM atomics are used in __sync_fetch_ libcalls so far.
      // Don't outline them unless
      // (1) high level <atomic> support approved:
      //   http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0493r1.pdf
      // (2) low level libgcc and compiler-rt support implemented by:
      //   min/max outline atomics helpers
      if (AI->getOperation() != AtomicRMWInst::Min &&
          AI->getOperation() != AtomicRMWInst::Max &&
          AI->getOperation() != AtomicRMWInst::UMin &&
          AI->getOperation() != AtomicRMWInst::UMax) {
        return AtomicExpansionKind::None;
      }
    }
  }

  // At -O0, fast-regalloc cannot cope with the live vregs necessary to
  // implement atomicrmw without spilling. If the target address is also on the
  // stack and close enough to the spill slot, this can lead to a situation
  // where the monitor always gets cleared and the atomic operation can never
  // succeed. So at -O0 lower this operation to a CAS loop. Also worthwhile if
  // we have a single CAS instruction that can replace the loop.
  if (getTargetMachine().getOptLevel() == CodeGenOptLevel::None ||
      Subtarget->hasLSE())
    return AtomicExpansionKind::CmpXChg;

That explains why -O0 differs from -O1 for nand and u?(min|max)

https://github.com/llvm/llvm-project/pull/74588