[PATCH][RFC] HLE support proposal

Wed Apr 17 10:10:13 PDT 2013

Hi Nadav

On Wed, 2013-04-17 at 09:13 -0700, Nadav Rotem wrote:
> 
> > Here, I am only talking hinting atomic instruction IR with HLE.
> > Hinting
> > existing atomic instruction is straight-forward as it captures the
> > goal
> > maintaining a portable IR between target with HLE and the one
> > without
> > HLE. In addition, all atomic-aware optimizations will work the same.
> > 
> 
> 
> You are still proposing to change the IR by adding new metadata.

Yeah, the IR change is the same as the one in previous proposal. As I
just heard the concern on adding new feature in SelectionDAG not IR.

> 
> 
> > For this proposal, adding a lower pass isn't straightforward
> > compared to
> > add trivial feature in SelectionDAG directly but it still obvious
> > enough
> > to understand. 
> 
> 
> Neither one of these changes is trivial. Both are very intrusive.
> People who don't care about HLE can't opt-out and have to pay the cost
> of the added complexity and compile time (see below). 

Could you list optimization handling atomic instruction but check
metadata attached on them? AFAIK, all existing optimizations around
atomic instructions don't check metadata at all. As HLE hint doesn't
change the semantic of atomic instruction at all, if you don't care HLE,
you don't need check them. They will work the same.

> 
> > The major benefit after adding a lowering pass
> > translating atomic instruction into target native atomic intrinsic
> > is:
> > it will reduce burden of implementing all atomic instructions on the
> > backend side. Most target hardwares only provide the minimal atomic
> > support instructions (i.e. LL/SC in all RISC targets), with this
> > pass,
> > those targets only need to care the codegen of LL/SC.
> > 
> 
> 
> Why is it the job of the compiler to implement HLE-intrinsics on
> non-HLE targets ? It needs to be in a library. 

It's not specific part for HLE support but a refactoring on how we share
the atomic instruction code generation among targets as I notice most of
targets in our backends cannot support full feature of atomic
instructions and each backend duplicates too much efforts on supporting
atomic instructions. I added here as the major issue in previous
discussion is the new changes added in DAG. This refactoring will bypass
DAG in some kind.

> 
> > > 
> > > 
> > > Other compilers have other considerations and I don't think that
> > > we
> > > need to compromise compile time or flexibility for this,
> > > especially if
> > > we have other alternatives. 
> > 
> > Could you elaborate the compile time overhead? From my measure, this
> > pass is fast as this pass has O(N) complexity and only processes
> > atomic
> > instructions. If we could keep tracking whether a function has
> > atomic
> > instruction or not, we could be even faster by skipping them
> > totally.
> > 
> > 
> 
> 
> Yes, its an O(N) pass that scans all of the instructions and does
> something with some of the instructions. The problem is that most
> people who don't care about will still have to pay the cost of the HLE
> implementation.

Again. That pass is added to refactor the atomic instruction code
generation in our current backend. By itself, it's not a HLE-specific
part. The reason I put it in this proposal is because people have
concern on adding new features in SelectionDAG.

In addition, this pass is easily skipped by keeping track whether a
function has atomic instructions. If no, we skip this pass. The overhead
is quite manageable.

- Michael

>  
> 
> 
> > I agree with you on one hardware features like CRC, encryption, but
> > not
> > HLE feature, which is a common feature for hardware with TM support.
> 
> 
> I don't think that HLE is very common. 
> 
> > I
> > don't think GCC community doesn't consider the approach you
> > mentioned.
> > In fact, if you look into GCC mailing list, people rejected that
> > proposal originally and preferred the approach hinting atomic
> > builtins.
> > 
> 
> I would like to see the HLE features implemented as intrinsics from
> start to finish without adding any burden on the rest of the
> compiler.