[llvm-dev] [RFC] MC support for variant scheduling classes.

Andrea Di Biagio via llvm-dev llvm-dev at lists.llvm.org
Thu May 10 08:58:28 PDT 2018


Hi all,

The goal of this RFC is to make information related to variant scheduling
classes accessible at MC level. This would help tools like llvm-mca
understand/resolve variant scheduling classes.

To achieve this goal, I plan to introduce a new class of scheduling
predicates
named MCSchedPredicate. An MCSchedPredicate allows the definition of boolean
expressions with a well-known semantic, that can be used to generate code
for
both MachineInstr and MCInst.

The new predicates are designed to be completely optional. Scheduling models
can use a combination of SchedPredicate and MCSchedPredicate to describe
variant reads and writes. Old scheduling predicate definitions would still
be
valid. New MCSchedPredicates would behave like normal scheduling predicates.

A bit of background
-------------------

Variant scheduling classes model situations where the instruction profile
depends on the value of certain operands.

For example, modern x86 processors know that a register-register XOR is a
zero-idiom if both operands are the same register. That means, the XOR would
be optimized out at register renaming stage, and no opcode issued to the
pipelines. A variant scheduling class can be used to describe this case (see
example below):

```
def ZeroIdiomWrite : SchedWriteRes<[]> { let Latency = 0; }

def ZeroIdiom : SchedPredicate<[{
    MI->getOpcode() == X86::XORrr &&
    MI->getOperand(0).getReg() == MI->getOperand(1).getReg()
}]>;

def WriteXOR : SchedWriteVariant<[
   SchedVar<ZeroIdiom,   [ZeroIdiomWrite],
   SchedVar<NoSchedPred, [WriteALU]
>;
```

Problems with the current design
--------------------------------

A SchedPredicate is essentially a custom block of C++ code used by the
SubtargetEmitter to generate a condition through a boolean expression.
A SchedPredicate sees all the definitions that are "captured" by the
`PredicateProlog` (another block of C++ code). It can also access public
members of TargetSchedule.

A common pattern used by the ARM scheduling models to define predicates is:
 - PredicateProlog "captures" the TargetInstrInfo object from the
   TargetSchedule object.
 - Each predicate uses the "captured" TargetInstrInfo object (TII) to call
   helpers exposed by the (target specific) InstrInfo interface.

Note that TargetSchedule and TargetInstrInfo are both CodeGen concepts.

SchedPredicate definitions only work on MachineInstr objects. Therefore, the
C++ code block is not portable (i.e. it doesn't work if the input
instruction
is a MCInst).  The `MI` used by the ZeroIdiom definition from the previous
example is a MachineInstr *.

The main problem with this design is that predicates don't have a "portable"
semantic.  A predicate is essentially an opaque block of code, and the
semantic of predicates is unknown to tablegen. Tablegen can only trust the
user, and just "copy-paste" code blocks from the various predicates to an
auto-generated `XXXGenSubtargetInfo::resolveSchedClass()` function.

This limits our ability to reason on predicates. In particular, it makes it
extremely hard (if not impossible) for tools that can only access the MC
layer
to reuse predicate definitions to resolve variant scheduling classes.

If instead we expose the semantic of predicates to tablegen, we can then
teach
tablegen how to generate an equivalent code-block that works on MCInst.

In the next section I show how I plan to expose the semantic of scheduling
predicates to tablegen. I will then go through a couple of examples
describing
how the new predicate syntax can be used, and finally I will describe the
patches required to implement this feature.

A new class of scheduling predicates
------------------------------------

MCSchedPredicate allows the definition of scheduling predicates that have a
well-defined portable semantic. They can be used in place of SchedPredicate
to
define SchedReadVariant and SchedWriteVariant definitions in tablegen.

An MCSchedPredicate definition is built on top of an MCPredicate.
MCPredicate
definitions can be composed together to form complex boolean expressions.

To better understand how these new predicates work, let's have a look at the
following example.

```
def M3BranchLinkFastPred  : SchedPredicate<[{MI->getOpcode() ==
AArch64::BLR &&
                                             MI->getOperand(0).isReg() &&
                                             MI->getOperand(0).getReg() !=
                                             AArch64::LR}]>;
```

This tablegen code snippet has been taken from
AArch64/AArch64SchedExynosM3.td

Predicate `M3BranchLinkFastPred` can be rewritten using an MCSchedPredicate
definition as follows:

```
def M3BranchLinkFastPred  : MCSchedPredicate<
  CheckAllOf<[
    CheckOpcode<[BLR]>,
    CheckRegOperand<0>,
    CheckNot<CheckRegOperandValue<0, LR>>]>
  >;
```

The MCSchedPredicate uses a `CheckAllOf`, which is a "composition of
predicates", and returns true only if every predicate in the composition
returns true. Note that `CheckAllOf`, `CheckOpcode`, `CheckRegOperand` and
`CheckNot` are all MCPredicate classes.

Each predicate class has a well known semantic. For example, `CheckOpcode`
is
only used to check if the opcode of an instruction is part of a set of
opcodes.
In this example, CheckOpcode is used to check if the instruction is a BLR.

This new syntax allows the definition of predicates in a declarative way.
These new predicates don't require custom blocks of C++, and can be used to
define conditions without being bound to a particular representation (i.e.
MachineInstr vs MCInst).

It also means that tablegen backends are now able to parse and understand
the
logic of each predicate check. But more importantly, tablegen backends
gained
the ability to "lower" scheduling predicates into code that work on MCInst
too.

A more complicated example involving TII method calls.
------------------------------------------------------

This code is taken from the AArch64 Cyclone scheduling model:

```
def WriteZPred : SchedPredicate<[{TII->isGPRZero(*MI)}]>;
def WriteImmZ  : SchedWriteVariant<[
                   SchedVar<WriteZPred, [WriteX]>,
                   SchedVar<NoSchedPred, [WriteImm]>]>;
```

Predicate WriteZPred is used to check if a GPR instruction is a zero-idiom.
The rationale is that zero-idioms have zero latency and don't consume
processor resources.

The predicate logic is defined by method `isGPRZero()`, which is accessible
through the TII object (i.e. a `const AArch64InstrInfo *`).

Below is the definition of `isGPRZero` in AArch64/AArch64InstrInfo.cpp:

```
// Return true if this instruction simply sets its single destination
register
// to zero. This is equivalent to a register rename of the zero-register.
bool AArch64InstrInfo::isGPRZero(const MachineInstr &MI) {
  switch (MI.getOpcode()) {
  default:
    break;
  case AArch64::MOVZWi:
  case AArch64::MOVZXi: // movz Rd, #0 (LSL #0)
    if (MI.getOperand(1).isImm() && MI.getOperand(1).getImm() == 0) {
      assert(MI.getDesc().getNumOperands() == 3 &&
             MI.getOperand(2).getImm() == 0 && "invalid MOVZi operands");
      return true;
    }
    break;
  case AArch64::ANDWri: // and Rd, Rzr, #imm
    return MI.getOperand(1).getReg() == AArch64::WZR;
  case AArch64::ANDXri:
    return MI.getOperand(1).getReg() == AArch64::XZR;
  case TargetOpcode::COPY:
    return MI.getOperand(1).getReg() == AArch64::WZR;
  }
  return false;
}
```

That logic can be replaced by the following MCPredicate definitions:

```
def CheckMOVZ : CheckAllOf<[
  CheckOpcode<[MOVZWi, MOVZXi]>,
  CheckNumOperands<3>,
  CheckImmOperand<1>,
  CheckZeroOperand<1>,
  CheckImmOperand<2>,
  CheckZeroOperand<2>
]>;

def CheckANDW : CheckAllOf<[
  CheckOpcode<[ANDWri]>,
  CheckRegOperand<1>,
  CheckRegOperandValue<1, WZR>
]>;

def CheckANDX : CheckAllOf<[
  CheckOpcode<[ANDXri]>,
  CheckRegOperand<1>,
  CheckRegOperandValue<1, XZR>
]>;

def CheckCOPY : CheckAllOf<[
  CheckPseudo<[COPY]>,
  CheckRegOperand<1>,
  CheckRegOperandValue<1, WZR>
]>;

// Return true if this instruction simply sets its single destination
register
// to zero. This is equivalent to a register rename of the zero-register.

def IsGPRZero : TIIPredicate<"AArch64", "isGPRZero",
  AnyOfMCPredicates<[CheckMOVZ, CheckANDW, CheckANDX, CheckCOPY]>>;
```

TIIPredicate definitions are used to model calls to the target-specific
InstrInfo.

A TIIPredicate definition is treated specially by the InstrInfoEmitter
tablegen backend, which will use it to automatically generate a definition
in the target specific `GenInstrInfo` class.

Basically, we can tell tablegen to generate that definition for us.

Now that the description of IsGPRZero is available in the form of a
MCPredicate, we can modify the original SchedWriteVariant WriteImmZ as
follows:

```
def WriteZPred : MCSchedPredicate<IsGPRZero>;

def WriteImmZ : SchedWriteVariant<[
                  SchedVar<WriteZPred, [WriteX]>,
                  SchedVar<SchedDefault, [WriteImm]>]>;
```

How to resolve scheduling classes from MC
-----------------------------------------

MCSubtargetInfo will gain a new method:

```
  /// Resolve a variant scheduling class for the given MCInst and CPU.
  virtual unsigned
  resolveVariantSchedClass(unsigned SchedClass, const MCInst *MI,
                           unsigned CPUID) const {
    return 0;
  }
```

The SubtargetEmitter is resonsible for processing scheduling classes and
generate an override for that method.

This is what the SubtargetEmitter generates for the Cyclone and Exynos3M if
we
implement the changes described by the previous sections:

```
unsigned resolveVariantSchedClass(unsigned SchedClass,
    const MCInst *MI, unsigned CPUID) const override {
  switch (SchedClass) {
  case 117: // BLR
    if (CPUID == 5) { // ExynosM3Model
      if ((
          ( MI->getOpcode() == AArch64::BLR )
          && MI->getOperand(0).isReg()
          && MI->getOperand(0).getReg() != AArch64::LR
        ))
        return 934; // M3WriteAB
      if (true)
        return 935; // M3WriteAC
    }
    break;
  case 386: // MOVZWi_MOVZXi
    if (CPUID == 3) { // CycloneModel
      if (AArch64_MC::isGPRZero(*MI))
        return 930; // WriteX
      if (true)
        return 962; // WriteImm
    }
    break;
  case 387: // ANDWri_ANDXri
    if (CPUID == 3) { // CycloneModel
      if (AArch64_MC::isGPRZero(*MI))
        return 930; // WriteX
      if (true)
        return 962; // WriteImm
    }
    break;
  case 695: // ANDWri
    if (CPUID == 3) { // CycloneModel
      if (AArch64_MC::isGPRZero(*MI))
        return 930; // WriteX
      if (true)
        return 962; // WriteImm
    }
    break;
  };
  // Don't know how to resolve this scheduling class.
  return 0;
  }
};
```

Note that this override will become a member of a new tablegen'd class named
AArch64GenMCSubtargetInfo. That class would directly extend MCSubtargetInfo.
Class AArch64GenMCSubtargetInfo is what will get instantiated by method
`Target::createMCSubtargetInfo()`.

----
Let's go back to the definition of IsGPRZero using a TIIPredicate.

```
def IsGPRZero : TIIPredicate<"AArch64", "isGPRZero",
  AnyOfMCPredicates<[CheckMOVZ, CheckANDW, CheckANDX, CheckCOPY]>>;
```

This is how the InstructionInfoEmitter expands the method in the tablegen'd
class AArch64GenInstrInfo:

```
  static bool isGPRZero(const MachineInstr &MI) {
    return (
      (
        (
          MI.getOpcode() == AArch64::MOVZWi
          || MI.getOpcode() == AArch64::MOVZXi
        )
        && MI.getNumOperands() == 3
        && MI.getOperand(1).isImm()
        && MI.getOperand(1).getImm() == 0
        && MI.getOperand(2).isImm()
        && MI.getOperand(2).getImm() == 0
      )
      || (
        ( MI.getOpcode() == AArch64::ANDWri )
        && MI.getOperand(1).isReg()
        && MI.getOperand(1).getReg() == AArch64::WZR
      )
      || (
        ( MI.getOpcode() == AArch64::ANDXri )
        && MI.getOperand(1).isReg()
        && MI.getOperand(1).getReg() == AArch64::XZR
      )
      || (
        ( MI.getOpcode() == TargetOpcode::COPY )
        && MI.getOperand(1).isReg()
        && MI.getOperand(1).getReg() == AArch64::WZR
      )
    );
  }
```

Another variant of function `isGPRZero` is expanded in the AArch64_MC
namespace (see below):

```
#ifdef GET_GENINSTRINFO_MC_DECL
#undef GET_GENINSTRINFO_MC_DECL
namespace llvm {
class MCInst;

namespace AArch64_MC {

bool isGPRZero(const MCInst &MI);

} // end AArch64_MC namespace
} // end llvm namespace
#endif // GET_GENINSTRINFO_MC_DECL

#ifdef GET_GENINSTRINFO_MC_HELPERS
#undef GET_GENINSTRINFO_MC_HELPERS
namespace llvm {
namespace AArch64_MC {

bool isGPRZero(const MCInst &MI) {
  return (
    (
      (
        MI.getOpcode() == AArch64::MOVZWi
        || MI.getOpcode() == AArch64::MOVZXi
      )
      && <...snip...>
    )
  );
}
} // end AArch64_MC namespace
} // end llvm namespace
#endif // GET_GENISTRINFO_MC_HELPERS
```

Function isGPRZero would live in namespace AArch64_MC.
The declaration of AArch64_MC::isGPRZero has to be made visible to
AArch64MCTargetDesc.h, so that it becomes known to the new
`resolveVariantSchedClass()` method.

As a side note: all this code is guarded by macro definitions. This allows
to
control their expansion (if we decide that we don't want them).


What to do next
---------------
I have a series of three patches ready to be sent upstream for review.

The first patch is mostly a no functional change. It introduces the new
scheduling predicate class in tablegen, and it teaches the
InstructionInfoEmitter and the SubtargetEmitter how to expand
MCSchedPredicate
definitions.
The first patch is up for review here: https:://reviews.llvm.org/D46695.

The second patch would teach the SubtargetEmitter how to generate method
resolveVariantSchedClass().

The last patch of the sequence will teach llvm-mca how to use method
`resolveVariantSchedClass()` to resolve variant classes. llvm-mca will
generate an error if the variant scheduling class cannot be resolved.

Review https://reviews.llvm.org/D46697 is the union of patch1 and patch2
only.
It is not meant to be reviewed at this stage, since it contains the code
changes related to patch1.

The third patch is available here: https://reviews.llvm.org/D46698.
D46698 requires patch1 and patch2.

Bonus (optional) patches:
 1) [X86] Teach scheduling models how to recognize zero-idioms.
    This would make easier to review the llvm-mca change.
 2) [X86] Add variant scheduling classes for LEA instructions.
 3) [AArch64] Rewrite the predicates mentioned by this RFC.

People that are interested in seeing how to implement "optional" patch 3 can
have a look at the review here: https://reviews.llvm.org/D46701

Please let me know what you think.

Thanks,
Andrea
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180510/f969d70d/attachment.html>


More information about the llvm-dev mailing list