[PATCH] D124308: [MachineScheduler] exclude INLINEASM from schedule when it would increase register pressure

Tue Apr 26 00:05:10 PDT 2022

pengfei added inline comments.

================
Comment at: llvm/lib/Target/X86/X86RegisterInfo.cpp:266-271
+  case X86::GR8RegClassID:
+  case X86::GR16RegClassID:
   case X86::GR32RegClassID:
     return 4 - FPDiff;
   case X86::GR64RegClassID:
     return 12 - FPDiff;
----------------
nickdesaulniers wrote:
> nickdesaulniers wrote:
> > pengfei wrote:
> > > Should we change them to `(Is64Bit ? 12 : 4) - FPDiff` ?
> > Yeah, I think so. Though I'm curious about the use of 4 in the first place. i386 has:
> > * eax
> > * ebx
> > * ecx
> > * edx
> > * esi
> > * edi
> > * ebp
> > * esp
> > * eip
> > 
> > As GPRs. eip and esp aren't allocatable, and ebp is the condition `FPDiff` here.  Why 4? I would have guessed 6 for the first 6 (for `-m32`)?
> > 
> > Perhaps 4 corresponds to `GR32_ABCD`, for some reason? So should the cases used be `GR32_ABCDRegClassID` rather than `GR8RegClassID`? Or should it be `GR8RegClassID` then use `6 - FPDiff`?
> > 
> > For 64b, I'd have guessed those six plus r8d to r15d, for a total of 14, not 12.  Am I missing something (perhaps about esi+edi)?
> Also, if we extend this from 4 to 6 on 32b, then the test case will no longer be fixed.The inline asm from the test case uses 1 GR8 and 4 GR32.
I'm guessing it's a heuristic number. It might consider an average of the usage of other special registers e.g., eax for returning, ecx for counter, esi and edi for data moving etc.
If the assumption is ture, then it makes sense to 64b using 12 (4 + 8), since the r8~r15 are pure general ones.

================
Comment at: llvm/lib/Target/X86/X86RegisterInfo.cpp:275
   case X86::VR64RegClassID:
     return 4;
   }
----------------
nickdesaulniers wrote:
> nickdesaulniers wrote:
> > pengfei wrote:
> > > I wonder why return 4 for it.
> > Good question. Some comments could be added to this function to help retain more context as to WHY these magic constants have these values. I'm happy to do so in these commits, if we can document _why_.
> > 
> > I'm running `git log -S X86TargetLowering::getRegPressureLimit` to see if I can find additional context.
> This code was added by commit:
> 
> commit 37b740c4bfbb ("Add an ILP scheduler.)
> Author:     Evan Cheng <evan.cheng at apple.com>
> AuthorDate: Sat Jul 24 00:39:05 2010 +0000
Oops, I made a mistake. I thought the `V` means virtual...
The `V` actually means vector. And the `VR64` are `MMX` registers which have the same size 8 on both 32b and 64b.
Why we use `4` for `VR64` and `VR128` on 32b while `10` on 64b. A wild guess is it's related to calling conversion.
Anyway, I think we should keep such value before we can make sure the origin.
OTOH, I think maybe we can more register classes (maybe a seperate patch), e.g.
```
  case X86::VR128RegClassID:
  case X86::VR256RegClassID:
  case X86::VR512RegClassID:
    return Is64Bit ? 10 : 4;
  case X86::VR128XRegClassID:
  case X86::VR256XRegClassID:
  case X86::VR512XRegClassID:
    return Is64Bit ? 26 : 4;
```

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124308/new/

https://reviews.llvm.org/D124308