[PATCH] D89855: [PowerPC] Extend folding RLWINM + RLWINM to post-RA.

Mon Dec 21 10:35:51 PST 2020

nemanjai added a comment.

I have added a number of comments to code that isn't part of this review which may seem odd. However, you are adding a fair amount of code to a function that is already very long/complex and rather than doing so, I'd prefer that you refactor the original function so the new code integrates more seamlessly.

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrInfo.cpp:3283
+  // RLWINM(RLWINM8) -> RLWINM.
+  switch (SrcMI->getOpcode()) {
+  case PPC::RLWINM:
----------------
This is a bit of a bizarre set up that makes it hard to read. How about skipping these switch statements and just doing what you're after:
1. Find the source MI
2. Check if they're both 32-bit or both 64-bit instructions

For 2. you can do something like:
```
static bool sameWidthRLWINM(MachineInstr *SrcMI, MachineInstr *UseMI,
                            bool &Is64Bit) {
  unsigned SrcOpc = SrcMI->getOpcode();
  unsigned UseOpc = UseMI->getOpcode();
  if ((SrcOpc == PPC::RLWINM8 || SrcOpc == PPC::RLWINM8_rec) &&
      (UseOpc == PPC::RLWINM8 || UseOpc == PPC::RLWINM8_rec)) {
    Is64Bit = true;
    return true;
  }
  if ((SrcOpc == PPC::RLWINM || SrcOpc == PPC::RLWINM_rec) &&
      (UseOpc == PPC::RLWINM || UseOpc == PPC::RLWINM_rec))
    return true;
  Is64Bit = false;
  return false;
}
```

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrInfo.cpp:3347
   // Mark special case.
   bool SrcMaskFull = (MBSrc - MESrc == 1) || (MBSrc == 0 && MESrc == 31);

----------------
Please describe this "special case" since it appears to be two special cases:
The first part means all bits in a 64-bit register and the second part means the low 32 bits in a 64-bit register.

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrInfo.cpp:3356
   // In MI, we only need low 32 bits of SrcMI, just consider about low 32
   // bit of SrcMI mask. Note that in APInt, lowerest bit is at index 0,
   // while in PowerPC ISA, lowerest bit is at index 63.
----------------
I am not sure what the intended meaning of `lowerest` is here, but please use something like `least significant bit`.

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrInfo.cpp:3366
   // If final mask is 0, MI result should be 0 too.
   if (FinalMask.isNullValue()) {
     Simplified = true;
----------------
Why can't this whole block be replaced by a call to `replaceInstrWithLI()`?

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrInfo.cpp:3396

   } else if ((isRunOfOnes((unsigned)(FinalMask.getZExtValue()), NewMB, NewME) &&
               NewMB <= NewME) ||
----------------
I find it unlikely that `clang-format` produced this formatting.

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrInfo.cpp:3408
     MI.getOperand(2).setImm(NewSH);
     // If SrcMI mask is full, no need to update MBMI and MEMI.
     if (!SrcMaskFull) {
----------------
No need, but is it incorrect to update them? Namely, won't the `NewMB/NewME` be the same as `MBMI/MEMI` respectively? If so, I think it is preferable to avoid the condition and leave it to the reader to figure out.

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrInfo.cpp:3414
     MI.getOperand(1).setReg(SrcMI->getOperand(1).getReg());
     if (SrcMI->getOperand(1).isKill()) {
       MI.getOperand(1).setIsKill(true);
----------------
In general, I find that manually messing with kill flags when changing the instructions to be an error-prone thing that should be avoided if possible? Is it possible to re-run the pass that computes kill flags after the peephole?

For example I think that here, if `MI.getOperand(1).isKill()` before the change, we will not update the kill flag on the previous use of that register so it will be considered live after `MI` whereas before it wasn't. This is of course not semantically incorrect, but may hinder some optimizations.

================
Comment at: llvm/test/CodeGen/PowerPC/vsx_builtins.ll:138
   %0 = tail call i32 @llvm.ppc.vsx.xvtdivdp(<2 x double> %a, <2 x double> %b)
   %1 = lshr i32 %0, 4
   %.lobit = and i32 %1, 1
----------------
It is not really related to this patch, but what is this test case supposed to do? Why would we produce a 4-bit field in a CR, move it to a GPR and then right shift by 4 (clearly shifting out all the bits)?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89855/new/

https://reviews.llvm.org/D89855