[PATCH] D47672: [Headers] Add _Interlocked*_HLEAcquire/_HLERelease

Wed Jun 6 15:30:18 PDT 2018

craig.topper added a comment.

It looks like gcc implements additional bits that can be passed to _atomic_exchange and friends, __ATOMIC_HLE_ACQUIRE(1 << 16) and __ATOMIC_HLE_RELEASE(1 << 17). Basically they're using bits above bit 16 in the order/memory_model as target specific flags. These constants are only defined when targeting X86 and they are validated to ensure they are only paired with the appropriate __ATOMIC_ACQUIRE or __ATOMIC_RELEASE or a stronger memory model.

As Reid said, its technically safe to drop the hints sometimes so we could use SubClassOptiionalData or metadata. But losing them could have performance implications. If you lose an XACQUIRE, the lock won't be elided as the user expected. And if you keep an XACQUIRE, but lose an XRELEASE the processor will keep trying to speculate farther than it should until it eventually hits some random abort trigger and has to rollback to really acquiring the lock. Both of these would be surprising to the user so we should make an effort not to lose the information as much as possible.

Here's a start at an implementation proposal with some embedded questions.
-Add the X86 __ATOMIC_HLE_ACQUIRE/__ATOMIC_HLE_RELEASE matching the gcc encoding value.
-Write these intrinsics to pass these flags.
-Teach CGAtomic.cpp to lower those hints to whatever IR representation we choose. If we choose SubclassOptionalData, we'll also need to add bitcode, LL parsing, and printing support. Not sure what we would need for metadata.
-Add an HLE_ACQUIRE and HLE_RELEASE prefixed version of every instruction that can be prefixed to the X86Instr*.td files with appropriate isel patterns. This matches what we do for LOCK already. This is probably somewhere between 130-150 instructions after tblgen expansion for operand sizes, immediate vs register, etc. Ideally we'd devise some way to tag MachineInstr* with a lock, hle acquire, and hle release so that we didn't need separate instruction opcodes for each permutation. But this would just make things scale better is not required for functionality.
-Need a way to represent this in SelectionDAG so X86 specific code can create the right target specific nodes. Do we have a metadata infrastructure there? Or should we store it with the ordering MachineMemOperand? Or in SDNodeFlags?

Obviously a lot of that will take some time. I wonder if it makes sense to add the __ATOMIC_HLE_ACQUIRE/__ATOMIC_HLE_RELEASE constants, but ignore them in CGAtomics.cpp for now? We could then implement these intrinsics with the code we ultimately want to see there, but not implement the hints yet. Thoughts?

Repository:
  rC Clang

https://reviews.llvm.org/D47672