[PATCH][RFC] HLE support proposal

Tue Apr 16 23:58:46 PDT 2013

Hi Nadav

On Tue, 2013-04-16 at 23:16 -0700, Nadav Rotem wrote:
> Hi Michael, 
> 
> 
> Thanks for the quick reply. 
> 
> > Sorry, the attached patches is for demonstration purpose only not
> > for
> > the final review. As I stated in the original email, it's neither
> > final
> > nor complete so far. But, with reference code, the proposal may be
> > understood better.
> 
> 
> It's 76K of code.  I think that it will be much more effective for you
> to write the mailing list before you code. 

Sorry, if you count lines, not that much (IMHO, ;)

$ wc 000*.patch
  385  1776 15356
0001-Add-CAS-intrinsic-to-help-refactoring-Atomic-support.patch
  624  2496 23441 0002-Add-X86-atomic-IR-lower-pass.patch
  225  1068  8750
0003-Add-XACQ-XREL-prefix-and-encoding-asm-printer-suppor.patch
  692  3664 27487 0004-Add-HLE-code-generation.patch
 1926  9004 75034 total

Only 002-Add-X86-atomic-IR-lower-pass.patch implements that pass. Other
patches just make reference code could do some real work. In
0002-Add-X86-atomic-IR-lower-pass.patch, only
lib/Target/X86/X86AtomicIRLower.cpp (264 line) does the lowering from
atomic instruction into target native atomic intrinsics. All other
changes in that patch just adds X86-specific atomic code generation
support, which is verbose but should be straightforward.

> 
> > 
> > There are two considerations:
> > 
> > One is to provide GCC/ICC compatibility. GCC (gcc-4.8) provides HLE
> > functionality through existing atomic builtins by ORing HLE hint
> > bits
> > into the memory order word. These builtins are implemented in
> > LLVM/Clang
> > through atomic instructions. It's a straight-forward approach by
> > hinting
> > atomic instructions.
> 
> 
> It is not a straight forward approach because it requires a new
> pre-isel pass, attaching metadata to instructions, etc.  Also, this
> solution increases the compile time of all generated x86 code
> (including debug builds) because it requires another scan of the IR.

Here, I am only talking hinting atomic instruction IR with HLE. Hinting
existing atomic instruction is straight-forward as it captures the goal
maintaining a portable IR between target with HLE and the one without
HLE. In addition, all atomic-aware optimizations will work the same.

For this proposal, adding a lower pass isn't straightforward compared to
add trivial feature in SelectionDAG directly but it still obvious enough
to understand. The major benefit after adding a lowering pass
translating atomic instruction into target native atomic intrinsic is:
it will reduce burden of implementing all atomic instructions on the
backend side. Most target hardwares only provide the minimal atomic
support instructions (i.e. LL/SC in all RISC targets), with this pass,
those targets only need to care the codegen of LL/SC.

> 
> 
> Other compilers have other considerations and I don't think that we
> need to compromise compile time or flexibility for this, especially if
> we have other alternatives. 

Could you elaborate the compile time overhead? From my measure, this
pass is fast as this pass has O(N) complexity and only processes atomic
instructions. If we could keep tracking whether a function has atomic
instruction or not, we could be even faster by skipping them totally.

> 
> 
> > 
> > The other is to provide a portable way supporting HLE in LLVM IR.
> > With
> > target supporting HLE, extra HLE hints provide the performance
> > benefit.
> > At the same time, the same IR will work correctly on targets without
> > HLE. Programmer doesn't need to ship two versions of IR if they want
> > to
> > target on both processors with or without HLE.
> > 
> 
> 
> This is exactly what I want to avoid. I don't want to change the
> LLVM-IR to add support for HLE.  LLVM supports many targets with many
> exotic features, and these features are exposed to developers using
> intrinsics.  The correct way to extend the LLVM IR is to use
> intrinsics.   

I agree with you on one hardware features like CRC, encryption, but not
HLE feature, which is a common feature for hardware with TM support. I
don't think GCC community doesn't consider the approach you mentioned.
In fact, if you look into GCC mailing list, people rejected that
proposal originally and preferred the approach hinting atomic builtins.

- michael

> 
> 
> > > 
> > > 
> > > Target specific intrinsics will allow HLE-aware synchronization
> > > library writers and researchers who implement alternative
> > > programming
> > > models to use HLE.
> > 
> > Checking HLE hint could achieve the same purpose and, at same time,
> > the
> > existing atomic semantic could be reused without checking new
> > intrinsics.
> 
> 
> Thanks,
> Nadav
> 
> 
> 
>