[PATCH][RFC] HLE support proposal

Thu Apr 18 22:37:05 PDT 2013

Hi Chris

On Thu, 2013-04-18 at 21:29 -0700, Chris Lattner wrote:
> Michael,
> 
> On Apr 18, 2013, at 1:49 PM, "Liao, Michael" <michael.liao at intel.com> wrote:
> > As locks are widely used in all kinds of applications, hardware lock elision is an attractive technology to improve locking with minimal or evolutionary changes. Because it's not introducing a brand new, full transactional memory programming model, it's more acceptable technique for a better locking. From what I learned, many system components are get hardware lock elision supported, such as libstdc++, glibc, pthread, Linux kernel, and etc. The number of application using it or experimenting with it is constantly growing, especially considering the new hardware is available soon. 
> 
> I don't think that anyone here is objecting to LLVM supporting HLE as a feature, they are just discussing the best way to implement the feature.  I think that many people would love to get proper HLE support, but we need to implement it in the right way.

Yeah, adding features in a right way is definitely what we want. Advice
from the community is very important.

> 
> > Compatibility to GCC
> 
> Source level compatibility with GCC and ICC are definitely interesting (and important), but I don't see why that would imply a specific implementation approach.  For example, clang supports xmmintrin.h (and friends) for compatibility with ICC and GCC, but *implements* them in completely different ways.

But the case is different from intrinsics implementation. GCC extends
the order parameter in atomic builtins (c++11 atomic builtins) with
additional HLE bits. If we provides the same source level compatibility,
hinting atomic instruction seems the most straightforward way. We
definitely could, in clang, dispatch atomic builtins to X86 specific
intrinsics after founding HLE bits. But it would mess the code in clang
considering the number of atomic builts to be supported and breaking of
layered design. (I'm not totally fan of layered design, but it
definitely keeps the code more readable and maintainable.)

> 
> > is just one of the considerations and the major design consideration is that an HLE hinted atomic instruction (LLVM IR) is still an atomic instruction. It won't break the existing optimizations but leverage the existing optimizations aware of atomic instruction, e.g. reducing memory barriers as much as possible. Otherwise, we have to introduce target specific optimizations duplicating these optimizations.
> 
> I don't understand the motivation here - what optimizations are you anticipating?  Most existing LLVM operations detect atomics and back off very quickly, so you're not getting any specific optimizations in practice.

Yeah, most of optimization passes just skip atomic insns. But some will
check/treat them specially and difference from general intrinsics. For
example, SimplyCFG only treats atomic instruction with volatile memory
location as one with side effects. But general intrinsics with side
effect will always be treated as having side effect no matter the memory
location is volatile or not. This's just one example, they are other
passes quite similar. In addition, there's one optimization in DAG
combination folding memory fence with atomic instruction if the target
has implicit fence semantic on atomic insn.

> 
> I share the general concern that you are adding substantial complexity for a single (admittedly important!) target, and gaining very little win for the additional complexity.

HLE, or SLE (speculative lock elision), or what vendor calls, is not an
X86-specific team. Any processor with transactional memory support could
support it. There're list of processors with TM supported. I know most
of them may still be prototype or on the paper but technically it's not
x86 specific thing.

Thanks for your valuable comments!

Yours
- Michael

> 
> -Chris
>