[PATCH] Add HLE target feature

Wed Feb 27 16:41:00 PST 2013

Hi Jeffrey

On Wed, 2013-02-27 at 16:23 -0800, Jeffrey Yasskin wrote:
> On Wed, Feb 27, 2013 at 4:14 PM, Michael Liao <michael.liao at intel.com> wrote:
> > Hi Krzysztof
> >
> > On Wed, 2013-02-27 at 17:50 -0600, Krzysztof Parzyszek wrote:
> >> On 2/27/2013 5:09 PM, Michael Liao wrote:
> >> >
> >> > "LLVM Atomic Instructions and Concurrency Guide" puts more details on
> >> > why atomic instructions are added in LLVM IR and optimizations around
> >> > it. For HLE support, you could treat an enhancement on atomic
> >> > instruction by taking advantage of hardware support or TM support, the
> >> > current approach is to add hint in atomic instructions already existed
> >> > in LLVM IR to help backend generate proper code.
> >>
> >> This sort of thing belongs in the X86 backend, not LLVM IR.  The IR
> >> already contains atomic instructions, and the decision of how to best
> >> implement them on a particular platform is best left to the individual
> >
> > HLE is not an X86-specific feature. Any hardware with TM support, e.g.
> > Power (thanks to Hal once pointed out), could support that but possibly
> > use different names, such as TLE (transactional lock elision) or SLE
> > (speculative lock elision) or whatever.
> 
> I'm curious how Power can take advantage of HLE metadata. Could you
> post an example LLVM IR sequence and its expansion into both x86 and
> Power assembly?

You may check
https://www.power.org/documentation/power-isa-transactional-memory/ for
Power-specific TM ISA. They presents transactional lock elision
technique here to transform a lock into a speculative lock. In fact,
with RTM (in X86's TSX) only, HLE could also be achieved with the cost
of code size.

Regarding to speculative atomic insn discussed there, the sequence could
be like (I assume we could get the meaning through the naming, if
anything confused, I'd like to provide more explanation.)

- atomic_rmw(lock, HLE_ACQ) will be translated into the following pseudo
where
  * lock is the pointer to the atomic variable,
  * HLE_ACQ is the hint of acquiring
  * HLE_REL is the hint of releasing
  * tbegin() starts a transactional region when return 0. If non 0 is
returned, the transaction is aborted
   * tend() ends a transactional region.
  * ttest() return true if the program is currently executed in
transaction mode; otherwise, it return false.

atomic_rmw(lock, HLE_ACQ) {
  if (tbegin()) {
    // aborted, try the regular atomic insn
    atomic_rmw(lock);
  }
  return *lock;
}

atomic_rmw(lock, HLE_REL) {
   if (ttest()) {
     tend();
     return;
   }
   atomic_rmw(lock);
}

To translate a spin lock into speculative spin lock, the original code
will be changed from

----------------------------------------------------------------
spin_lock(lock) {
   while (atomic_test_and_set(lock, 0, 1, memory_order_acquire))
     ;
}

spin_unlock(lock) {
   atomic_store(lock, 0, memory_order_release);
}
----------------------------------------------------------------

to

================================================================
spin_lock(lock) {
   while (atomic_test_and_set(lock, 0, 1, memory_order_acquire |
HLE_ACQ))
     ;
}

spin_unlock(lock) {
   atomic_store(lock, 0, memory_order_release | HLE_REL);
}
================================================================

that's it. Ofc, the spin-lock needs some fine tuning on the backoff
behaviour to improve fairness and always make progress. But anyway,
that's the whole idea of implementing HLE through TM. (I just copied
these ideas from early TM papers.)

Yours
- Michael