[PATCH][RFC] HLE support proposal

Fri Apr 19 09:52:34 PDT 2013

On Apr 18, 2013, at 10:37 PM, Michael Liao <michael.liao at intel.com> wrote:
>>> Compatibility to GCC
>> 
>> Source level compatibility with GCC and ICC are definitely interesting (and important), but I don't see why that would imply a specific implementation approach.  For example, clang supports xmmintrin.h (and friends) for compatibility with ICC and GCC, but *implements* them in completely different ways.
> 
> But the case is different from intrinsics implementation. GCC extends
> the order parameter in atomic builtins (c++11 atomic builtins) with
> additional HLE bits. If we provides the same source level compatibility,
> hinting atomic instruction seems the most straightforward way. We
> definitely could, in clang, dispatch atomic builtins to X86 specific
> intrinsics after founding HLE bits. But it would mess the code in clang
> considering the number of atomic builts to be supported and breaking of
> layered design. (I'm not totally fan of layered design, but it
> definitely keeps the code more readable and maintainable.)

Again, this is a source compatibility issue.  Having target-specific code in clang that handles this is perfectly reasonable.  Clang already has (and *must* have) a bunch of target specific knowledge in every subsystem, starting at the lexer/preprocessor layer.

A clean way to handle this is to add a target callback to produce the exact code when these hints are present.  The X86-specific code in Clang can just insert the right intrinsics.   If these were supported on another target, that target could implement the hook the right way for their architecture.

>> 
>>> is just one of the considerations and the major design consideration is that an HLE hinted atomic instruction (LLVM IR) is still an atomic instruction. It won't break the existing optimizations but leverage the existing optimizations aware of atomic instruction, e.g. reducing memory barriers as much as possible. Otherwise, we have to introduce target specific optimizations duplicating these optimizations.
>> 
>> I don't understand the motivation here - what optimizations are you anticipating?  Most existing LLVM operations detect atomics and back off very quickly, so you're not getting any specific optimizations in practice.
> 
> Yeah, most of optimization passes just skip atomic insns. But some will
> check/treat them specially and difference from general intrinsics. For
> example, SimplyCFG only treats atomic instruction with volatile memory
> location as one with side effects. But general intrinsics with side
> effect will always be treated as having side effect no matter the memory
> location is volatile or not. This's just one example, they are other
> passes quite similar. In addition, there's one optimization in DAG
> combination folding memory fence with atomic instruction if the target
> has implicit fence semantic on atomic insn.

I'm still completely unconvinced that you're going to get a win out of  this in practice.  The number of static HLE instructions in any application is going to be far far lower than the number of normal loads and stores in the program.  Punishing normal loads and stores (in terms of compiler complexity) for such a corner case does not seem worth it. 

-Chris