[PATCH] Add HLE target feature

Wed Feb 20 22:00:27 PST 2013

> 
> While I agree that we shouldn't add needless complexity to the compiler, it is hard to decide whether a feature is worth the complexity until we have some rough idea both of how valuable the feature is *and* the complexity it brings. We can't easily figure out the latter without having a reasonable design (or if we do, we'll be wrong).

I agree. 

> Now, some argue that HLE isn't actually beneficial in practice. I've been around that discussion a few times, and consistently it boils down to "Does the code rely on fine-grained locking? If so, then HLE helps. If not, it doesn't." There are more subtle details, but that's the core of the issue that I've seen. Do you see other issues with the relevance of HLE?
> 

I don't question the usability of the HLE instructions. I am sure that the architects who introduced them studied the usefulness of these instructions and considered the benefit of the instructions vs. cost of the implementation. 

I am sure that the HLE instructions are very useful for people who implement synchronization mechanisms. They can use inline assembly. 

> If not, then I can say that I've been on both sides of this particular fence, writing both fine-grained and coarse-grained synchronization. I don't know what the ratio of importance is between the two, but I'm at least convinced that there exists fine-grained locking in the world, and it would seem generally useful for LLVM to support the functionality hardware vendors are building to make that code execute more efficiently.
> 
> But even if HLE won't actually help applications that you or I care about, there is another aspect to this. If generic libraries are written to leverage HLE when it *does* help performance, but doing so makes them more opaque to the optimizer, then using such libraries will actively harm performance of code where HLE is a wash. This all comes back to the fact that a significant motivation in modeling the most fundamental synchronization patterns directly in the IR is ensuring that these synchronizations don't overly penalize standard scalar optimizations.

This is an interesting topic. In LLVM if we see a Load/Store that is decorated with "volatile" or "atomic" then we give up and don't try to optimize it.  What other scalar optimizations do you have in mind ?

Thanks,
Nadav