[PATCH] [AArch64] Enhance rematerialization by adding a new API isAsCheapAsAMove in TargetInstroInfo

Wed Jul 2 22:27:48 PDT 2014

Hi Quentin,

> >  Do you mean because this flag is being used in an arbitrary manner everywhere, we want to work out another more centralized solution to describe rematerializable attribute?
>
> More or less. What I am saying is that we should either:
> - Really deprecate this flag, and thus update the implement of isTriviallyRematerializable, e.g., isAsCheapAsMove or
> - Update the comments to express what isRematerialize really means.

OK. I will update the comments of isRematerializable, and it looks much simpler than update the implement of isTriviallyRematerializable.
I tried to move the check of isAsCheapAsMove to be insde isReallyTriviallyReMaterializable(), but this semantic change obviously could introduce a lot of "make check-all" regressions. I think, even if we want to change this, it would be much better we do it separately.

> Indeed, in this patch, it is not clear to me why we choose to mark some instruction rematerializable and some others not (e.g., if add immediate is rematerializable, why mul immediate is not... the notion of cost seems relevant). But you are right, this may be something general.

I see your point now. I admit the connection between isAsCheapAsMove interface and rematerizable is not very clear from this patch. The root cause is RegisterCoalescer pass has the following checks,

  if (!TII->isAsCheapAsAMove(DefMI))
    return false;
  if (!TII->isTriviallyReMaterializable(DefMI, AA))
    return false;

Yes, I simply mark the intrstuctions that could be as cheap as move with isRematerializable flag, and just because they have low cost for cortex-a57 and cortex-a53.

Actually we can add more instructions, but I only want to conservatively add those instructions in this patch at the moment. And then later, we can add more with performance data. In particular, I will try ADRP rematerializaiton here later on.
 
>
>
> > Or we needn't this flag at all, and the algorithm of rematerialization should just check cost like the info provided by isAsCheapAsMove?
>
> That might be a livable option. Honestly, I haven't thought a lot about that, it is just that this patch highlights two strange things: the use of isRematerializable flag, and its semantic.

isTriviallyReMaterializable is being used in four back-end places at the moment.

* Live Range Analysis
* Machine LCIM
* Register Coalesce
* Calculate SpillWeights

In theory, I think we can make this change, but we need to carefully check those algorithms. isRematerializable depends on cost, so it is a micro-architecture dependent attribute. At this point, it seems not good to add them in xxxInstroxxx.td file, because those .td files describing instructions look more like a ISA level stuff.

Thanks,
-Jiangning

>
> What would you suggest?
>
> Thanks,
> -Quentin
>
> http://reviews.llvm.org/D4361
>
>

================
Comment at: lib/Target/AArch64/AArch64InstrInfo.cpp:544
@@ -543,1 +543,3 @@
 
+// FIXME: this implementation should be u-arch dependent, so a u-arch target
+// hook should be introduced here in future.
----------------
Eric Christopher wrote:
> Since this is ascii, just use the word "micro-architecture" :)
OK. I will fix that.

================
Comment at: lib/Target/AArch64/AArch64InstrInfo.cpp:553
@@ +552,3 @@
+
+  if (TM->getTargetCPU() == "cortex-a57" ||
+      TM->getTargetCPU() == "cortex-a53") {
----------------
Eric Christopher wrote:
> This should query the subtarget, not the target machine.
OK. I will fix that.

================
Comment at: lib/Target/AArch64/AArch64InstrInfo.cpp:564
@@ +563,3 @@
+    case AArch64::SUBXri:
+    case AArch64::ADDSWri:
+    case AArch64::ADDSXri:
----------------
Quentin Colombet wrote:
> Trying to rematerialize the S variant seems wrong to me. Those define the flags, so it is not safe to move them around.
> 
> Currently, it may not cause any problem as isTriviallyRematerializable checks that the instruction only defines one virtual register IIRC.
> Anyway the logic looks wrong.
OK. I will remove them.

================
Comment at: lib/Target/AArch64/AArch64InstrInfo.cpp:568
@@ +567,3 @@
+    case AArch64::SUBSXri:
+      if (MI->getOperand(2).getImm() != 0)
+        return false;
----------------
Quentin Colombet wrote:
> I believe that the shifter immediate is at index 3, not 2:
> def0 = ADDXri arg1, imm2, shifter3
> 
> => Simply "return MI->getOperand(3).getImm() == 0;"?
> 
> BTW, is that test even correct from a performance perspective?
> Indeed, in your example, you are rematerializing stack adjustments, but the value for stack adjustments is  known way after register coalescing (which does the re-materialization in your case). Therefore, something that seems cheap, may not be cheap after all.
Ah! This is my mistake and I will fix that.

Yes. it's possible not cheap at all if the stack is huge.

I rerun the benchmark, and it seems I still can observe the running time of spec2000/perlbmk is reduced a lot.

http://reviews.llvm.org/D4361