[PATCH][RFC] HLE support proposal

Wed Apr 17 13:36:04 PDT 2013

Hi Nadav

I cannot follow your statement on complexity and complexity. Could you
give me more concrete reasons? Broadly speaking of that definitely won't
help us to revise the proposal.

I don't why you insists on implementing HLE as separate intrinsics
without considering all the comments previously we raised. HLE needs to
leverage the existing optimization on atomic instructions. Adding them
as separate intrinsics will add significant overhead on duplicating
these optimizations and breaking layered design, not less to say how
many intrinsics we needed.

Yours
- Michael

On Wed, 2013-04-17 at 13:27 -0700, Nadav Rotem wrote:
> Michael,  
> 
> 
> I understand that you are very motivated to get the HLE change in, but
> I still think that the right approach is to implement target specific
> intrinsics start-to-end. The current approach is unacceptable because
> it adds complexity and hurts the compile time and it does not add any
> value for most users. Please implement this as intrinsics without any
> changes to the IR. 
> 
> 
> Thanks,
> Nadav
> 
> 
> 
> On Apr 17, 2013, at 10:10 AM, Michael Liao <michael.liao at intel.com>
> wrote:
> 
> > Hi Nadav
> > 
> > On Wed, 2013-04-17 at 09:13 -0700, Nadav Rotem wrote:
> > > 
> > > > Here, I am only talking hinting atomic instruction IR with HLE.
> > > > Hinting
> > > > existing atomic instruction is straight-forward as it captures
> > > > the
> > > > goal
> > > > maintaining a portable IR between target with HLE and the one
> > > > without
> > > > HLE. In addition, all atomic-aware optimizations will work the
> > > > same.
> > > > 
> > > 
> > > 
> > > You are still proposing to change the IR by adding new metadata.
> > 
> > Yeah, the IR change is the same as the one in previous proposal. As
> > I
> > just heard the concern on adding new feature in SelectionDAG not IR.
> > 
> > > 
> > > 
> > > > For this proposal, adding a lower pass isn't straightforward
> > > > compared to
> > > > add trivial feature in SelectionDAG directly but it still
> > > > obvious
> > > > enough
> > > > to understand. 
> > > 
> > > 
> > > Neither one of these changes is trivial. Both are very intrusive.
> > > People who don't care about HLE can't opt-out and have to pay the
> > > cost
> > > of the added complexity and compile time (see below). 
> > 
> > Could you list optimization handling atomic instruction but check
> > metadata attached on them? AFAIK, all existing optimizations around
> > atomic instructions don't check metadata at all. As HLE hint doesn't
> > change the semantic of atomic instruction at all, if you don't care
> > HLE,
> > you don't need check them. They will work the same.
> > 
> > > 
> > > > The major benefit after adding a lowering pass
> > > > translating atomic instruction into target native atomic
> > > > intrinsic
> > > > is:
> > > > it will reduce burden of implementing all atomic instructions on
> > > > the
> > > > backend side. Most target hardwares only provide the minimal
> > > > atomic
> > > > support instructions (i.e. LL/SC in all RISC targets), with this
> > > > pass,
> > > > those targets only need to care the codegen of LL/SC.
> > > > 
> > > 
> > > 
> > > Why is it the job of the compiler to implement HLE-intrinsics on
> > > non-HLE targets ? It needs to be in a library. 
> > 
> > It's not specific part for HLE support but a refactoring on how we
> > share
> > the atomic instruction code generation among targets as I notice
> > most of
> > targets in our backends cannot support full feature of atomic
> > instructions and each backend duplicates too much efforts on
> > supporting
> > atomic instructions. I added here as the major issue in previous
> > discussion is the new changes added in DAG. This refactoring will
> > bypass
> > DAG in some kind.
> > 
> > > 
> > > > > 
> > > > > 
> > > > > Other compilers have other considerations and I don't think
> > > > > that
> > > > > we
> > > > > need to compromise compile time or flexibility for this,
> > > > > especially if
> > > > > we have other alternatives. 
> > > > 
> > > > Could you elaborate the compile time overhead? From my measure,
> > > > this
> > > > pass is fast as this pass has O(N) complexity and only processes
> > > > atomic
> > > > instructions. If we could keep tracking whether a function has
> > > > atomic
> > > > instruction or not, we could be even faster by skipping them
> > > > totally.
> > > > 
> > > > 
> > > 
> > > 
> > > Yes, its an O(N) pass that scans all of the instructions and does
> > > something with some of the instructions. The problem is that most
> > > people who don't care about will still have to pay the cost of the
> > > HLE
> > > implementation.
> > 
> > Again. That pass is added to refactor the atomic instruction code
> > generation in our current backend. By itself, it's not a
> > HLE-specific
> > part. The reason I put it in this proposal is because people have
> > concern on adding new features in SelectionDAG.
> > 
> > In addition, this pass is easily skipped by keeping track whether a
> > function has atomic instructions. If no, we skip this pass. The
> > overhead
> > is quite manageable.
> > 
> > - Michael
> > 
> > > 
> > > 
> > > 
> > > > I agree with you on one hardware features like CRC, encryption,
> > > > but
> > > > not
> > > > HLE feature, which is a common feature for hardware with TM
> > > > support.
> > > 
> > > 
> > > I don't think that HLE is very common. 
> > > 
> > > > I
> > > > don't think GCC community doesn't consider the approach you
> > > > mentioned.
> > > > In fact, if you look into GCC mailing list, people rejected that
> > > > proposal originally and preferred the approach hinting atomic
> > > > builtins.
> > > > 
> > > 
> > > I would like to see the HLE features implemented as intrinsics
> > > from
> > > start to finish without adding any burden on the rest of the
> > > compiler.  
>