[PATCH][RFC] HLE support proposal

Thu Apr 18 13:49:47 PDT 2013

Hi Evan

As locks are widely used in all kinds of applications, hardware lock elision is an attractive technology to improve locking with minimal or evolutionary changes. Because it's not introducing a brand new, full transactional memory programming model, it's more acceptable technique for a better locking. From what I learned, many system components are get hardware lock elision supported, such as libstdc++, glibc, pthread, Linux kernel, and etc. The number of application using it or experimenting with it is constantly growing, especially considering the new hardware is available soon. 

Compatibility to GCC is just one of the considerations and the major design consideration is that an HLE hinted atomic instruction (LLVM IR) is still an atomic instruction. It won't break the existing optimizations but leverage the existing optimizations aware of atomic instruction, e.g. reducing memory barriers as much as possible. Otherwise, we have to introduce target specific optimizations duplicating these optimizations.

Sorry, at first, I didn't consider that really as it's just a new metadata attached. But after Nadav raised the issue, we really looked into whether the change will add non-minimal change in the existing work. So far, I cannot figure out why a new metadata which is neglected in almost all passes except needed propagating to backend through AtomicSDNode without a 2-bit hint field. I really appreciate your help if you could give more insights on the complexity added in compiler.

Thanks
- Michael

-----Original Message-----
From: Evan Cheng [mailto:evan.cheng at apple.com] 
Sent: Wednesday, April 17, 2013 11:27 PM
To: Liao, Michael
Cc: Nadav Rotem; llvm-commits at cs.uiuc.edu
Subject: Re: [PATCH][RFC] HLE support proposal

Hi Michael,

Every new feature adds some complexity to llvm. So ultimately it comes down to whether the added functionality is worth the trade off. One missing bit of data here is who / what are the clients of HLE support. Can you elaborate?

>From your comments in this thread, it seems the motivation is to gain feature parity with gcc. That by itself is not really a strong enough argument since the two projects have different design goals. LLVM really isn't about having every last feature that it's competitions have.

LLVM has many features and it is being used in many ways. It is now being used in dynamic environment such as WebKit. We are getting concerned about its increasing heft and are thinking hard about putting it on a diet. You can understand why we question the cost of esoteric features such as HLE. 

Evan

Sent from my iPad

On Apr 17, 2013, at 3:54 PM, Michael Liao <michael.liao at intel.com> wrote:

> Hi Nadav
> 
> On Wed, 2013-04-17 at 15:43 -0700, Nadav Rotem wrote:
>> 
>> On Apr 17, 2013, at 1:36 PM, Michael Liao <michael.liao at intel.com>
>> wrote:
>> 
>>> Hi Nadav
>>> 
>>> I cannot follow your statement on complexity and complexity. Could 
>>> you give me more concrete reasons? Broadly speaking of that 
>>> definitely won't help us to revise the proposal.
>> 
>> 
>> You are increasing the compile time by adding a new pass that needs 
>> to run over all of the code that the X86 backend needs to run.  You 
>> are adding unnecessary complexity by adding target specific semantics 
>> LLVM's IR and by adding a new pass. Your proposal is completely 
>> unscalable. We can't add new passes and new metadata to support every 
>> little feature. If we go down this road we'll have dozens of target 
>> specific passes, tricks and hooks all over the compiler. The 
>> maintainability of the compiler, as well as the compile time for 
>> people who don't care about feature is more important. And what is 
>> the gain of all this ? Just the ability to run LLVM's optimizations 
>> for atomic instruction ? Do you really have workloads that benefit 
>> from these optimizations ? At the end of the day there is a tradeoff 
>> between atomic optimizations for HLE atomic operations (which you 
>> even failed to mention what they are) and the added complexity to the 
>> compiler.
> 
> This new is introduced as a way to move away from SelectionDAG 
> compared to previous proposal. This new pass could be revised to be 
> applied to targets supporting HLE and this new pass could be skipped 
> totally by keeping tracking whether a function has atomic instructions used.
> 
> BTW, the pass itself, even it's written in x86', is intended to a 
> general pass. This pass itself is targeted to simplify backend support 
> of atomic instructions by reusing CAS/LLSC loop as much as possible.
> That's why LLSC is added (LLSC is supported on almost all RISC 
> processors but not on x86.)
> 
> For LLVM IR hinting only, putting this pass aside (since it's targeted 
> to DAG issue in last proposal.), could you elaborate details on why it 
> adds complexity in compiler.
> 
> - Michael
> 
> 
>> 
>>> I don't why you insists on implementing HLE as separate intrinsics 
>>> without considering all the comments previously we raised. HLE needs 
>>> to leverage the existing optimization on atomic instructions. Adding 
>>> them as separate intrinsics will add significant overhead on 
>>> duplicating these optimizations and breaking layered design, not 
>>> less to say how many intrinsics we needed.
>>> 
>>> Yours
>>> - Michael
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits