[PATCH] D79767: [ARM] Macro fuse t2LoopDec and t2LoopEnd

Tue Jul 21 03:33:12 PDT 2020

dmgreen marked an inline comment as done.
dmgreen added inline comments.

================
Comment at: llvm/lib/Target/ARM/ARMSubtarget.h:708
   /// Return true if the CPU supports any kind of instruction fusion.
-  bool hasFusion() const { return hasFuseAES() || hasFuseLiterals(); }
+  bool hasFusion() const { return hasFuseAES() || hasFuseLiterals() || hasLOB(); }

----------------
SjoerdMeijer wrote:
> dmgreen wrote:
> > SjoerdMeijer wrote:
> > > Do we need to add LOB here, i.e. are we using that?
> > > If so, is "fusing" the right thing? Is the use of LOB here the same as e.g. "fusing AES" instructions?
> > I think the idea is that we only call createARMMacroFusionDAGMutation if we have instructions to fuse, and we would not have ARM::t2LoopEnd if we did not have low overhead branches.
> > 
> > If you mean "should we add a hasFuseLOB subtarget feature for this" - I'm not sure. It will depend on how we end up changing the instructions. If we do end up using this method to force t2LoopDec and t2LoopEnd close together in order to reduce the live range of a CPSR, then we should probably do that for all subtargets.
> > 
> > But a t2LoopDec and a t2LoopEnd are really modelling a single instruction, and maybe it makes sense to just make them into a single thing? That would obviously make this patch unnecessary.
> I was really starting to think the same:
> 
> > But a t2LoopDec and a t2LoopEnd are really modelling a single instruction, and maybe it makes sense to just make them into a single thing? 
> 
> What we do here in this patch starts to feel like a convoluted approach, and indeed, since we are modelling a single instruction, shouldn't we go for that then? Or would that have disadvantages?
My understanding is that it is a terminator that produces a value, and that does not work very well. If you need to spill the value, you would have not place to spill it.

I may be wrong though, I'm not sure entirely why it was done in the way it was and am really just guessing. We had an older downstream implementation that worked that way I think, but always had bugs. Whether they were fixable I'm not sure. It would take some investigation.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79767/new/

https://reviews.llvm.org/D79767