[LLVMdev] Proposal for atomic and synchronization instructions

Chandler Carruth chandlerc at gmail.com
Mon Jul 9 13:36:56 PDT 2007


> > > "While the processor may spin and attempt the atomic operation more than
> > > once before it is successful, research indicates this is extremely
> > > uncommon." I don't understand this sentence, what do you mean?
> >
> > I'm not sure I can pinpoint the paper from which the statement is based,
> > but I seem to recall something similar in the original LL-SC papers
> > (Maurice Herlihy, DEC Western Research Labs?) It's a foundation for
> > lock-free algorithms.
>
> Well, the statement says that often you have low contention. But that's
> something you want, not necessarily something you will get, and depends on
> the workload/algorithm. I'm missing the context. Is the actual statement as
> obvious as that you should try to use the atomic instructions offered by your
> processor, instead of doing blocking algorithms?

LL/SC is not a blocking algorithm. I'm going to be changing some of
the nomenclature on the page to reflect this, but while it spins, it
does not actually lock. The idea (as I understand it, and I'm still
hoping Scott can find the reference he gave me to a DEC paper
outlining this) is that the spin only occurs if another process does
something breaking atomicity for that _particular_ LL/SC pairing. Even
when the spin occurs, it should only occur until that particular
process gets through the op without interruption. This needs some
statistical analysis however, and hopefully the research in the
literature can be located and referenced.

>
> > > You probably don't need to require CAS (compare-and-set) to return the
> > > previous value (I think some architectures don't), but just return a
> > > boolean value (success/failure).
> >
> > compare and swap?
>
> Well, do you need the swap, or is a compare-and-set sufficient most of the
> time? What do other architectures offer?

All of the architectures offer swap, or have no penalty for swapping.
This allows much easier algorithm development to my mind, as you have
the best of both worlds -- success/failure information, and the value
from memory.

> > > What are the reasons because of which you picked the Load/Store model for
> > > barriers and not some other kind (e.g., acquire/release/...)?
> >
> > Chandler looked at what the various current LLVM architectures and
> > summarized what he found. What he found are the memory barriers that the
> > various processors support.
>
> What you would want is to have a model that is (1) easy-to-use for the
> developers and (2) close to what the hardware offers. L/S membars are easy to
> use, but I think some architectures such as Itanium offer different membars
> with different costs. So if you pick the wrong model and have to use stronger
> membars (mfence Itanium) to implement your model, than you pay for that by
> decreased performance.

Itanium was the only architecture to offer these semantics, while the
L/S membars are offered to varying levels of detail on several
architectures. As Itanium is not yet a fully functional target, it was
not prioritized.

Moreover, as the only instructions (to my knowledge) on Itanium to
have memory synchronization components are cmpxchg and fetchadd, these
could be implemented correctly when implementing the lowering for the
instructions in this proposal, while still providing full memory
barriers when needed outside of the atomic instructions. If there is
serious demand for building memory semantics into the atomic
instructions, "aquire" and "release" flags could be used, and
implementations appropriately handle them. This doesn't seem to anull
the need for non-operation-based memory barriers.

> > > Did you have a look at the atomic_ops project?
> > > http://www.hpl.hp.com/research/linux/atomic_ops/
> > > It already has implementations for several architectures and several
> > > compilers. It uses a different consistency model (different set of
> > > constraints for operations) and groups necessary memory barriers with
> > > instructions (helpful on some architectures). It supports a few more
> > > operations. The author (Hans Boehm) seems to also be active in the area
> > > of C/C++ memory models (or some support for this).
> >
> > LLVM doesn't emit external library calls -- there is no "-lllvm" to
> > which programs have to link, so adding an atomic operation library is
> > likely to be a non-starter. LLVM is interested in emitting instructions
> > to make atomic operations (and higher level concurrency primitives)
> > possible, which is why Chandler's work is usefully important.
>
> Please have a real look at atomic_ops first. It does have a library part to
> it -- but that's just for a nonblocking stack.
>
> All the atomic operations are macros and asm for the specific
> compiler/architecture pairs.
> So if you reuse that in the LLVM code generators, you save one large part of
> the work. Of course you can redo all this work, surely giving you a very fast
> start...

The implementations for the current proposal came from architecture
manuals, the Linux kernel, and the Apache Portable Runtime. I will
definitely be looking at the atomic_ops implementations to see if
there are improvements that can be made, but ultimately this provides
another model, but not a re-usable component as this must be done
through codegen at some point.

> Second, I guess there has been some serious effort put into selecting the
> specific model. So, for example, if you look at some of Hans' published
> slides etc., there are some arguments in favor of associating membars with
> specific instructions. Do you know reasons why LLVM shouldn't do this?

My reason for not associating them is due to the majority of hardware
implementations not associating them. The override motive was to
remain very close to the hardware. Could libraries and intrinsic
functions in the FE provide these different interfaces to the
constructs?

> Has anyone looked at the memory models that are being in discussion for C/C++?
> Although there is no consensus yet AFAIK, it should be good for LLVM to stay
> close.

Not that LLVM should shun C/C++, but those aren't its only languages.
I think its better to approach the problem from the hardware, than the
language. This keeps LLVM an accurate layer for expressing hardware
operations, and allows languages to translate their constructs to
appropriately use the hardware.-

> And please observe that I didn't state that the work is not important or not
> useful. We should just strive to select the best model we have, and reuse
> work if we can. And if we can reuse a tested implementation and model, that
> is a good thing.

I absolutely agree. This is why every aspect of the current proposal
came from a hardware representation, combined with the Linux kernel
representations. We deviated toward the hardware to ensure the ability
of these intrinsics to fully exploit and expose the hardware
capabilities while remaining lowerable (if that were a word) across
all targets.

Does this clarify some? I am quite open to trying to add support for
Itanium-style hardware representations if this is a significant issue,
it was simply not a priority, and not a problem that I well
understand. (The Linux kernel does not use these semantics that I
could find, but then it may not be the best example.)

-Chandler Carruth

>
> Torvald
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>



More information about the llvm-dev mailing list