[PATCH] D117451: [MCA] Proposing the InstrPostProcess:modifyInstrDesc() method.

Sun Jan 16 21:07:58 PST 2022

holland11 created this revision.
holland11 added reviewers: qcolombet, andreadb.
Herald added subscribers: pengfei, gbedwell, hiraditya.
holland11 requested review of this revision.
Herald added a project: LLVM.

**TLDR**: Currently, by the time `CustomBehaviour` and `InstrPostProcess` get involved, some of an instruction's attributes have been made const and cannot be modified. This patch is a proposal to inject a new method (`InstrPostProcess::modifyInstrDesc()`) after these specific attributes have been set, but before they have been made const.

`CustomBehaviour` and `InstrPostProcess` were designed for the purpose of giving developers the ability to modify an instruction's behaviour within mca without having to modify the tablegen. Because of where `CustomBehaviour` is injected, it is only capable of adding hazards that aren't detected automatically. It is **not** capable of //removing// hazards.

`InstrPostProcess` is injected earlier and gives developers a chance to make modifications to the instruction before the simulation even begins. However, some of an instruction's important attributes are stored within its `InstrDesc` object. After this `InstrDesc` object is created (but before `InstrPostProcess` gets a chance to make any modifications), the object is made `const`. This limits what `InstrPostProcess` is capable of modifying.

My original motivation for this patch came because I was working on a group of downstream targets where we wanted to set the `RetireOOO` flag for all of our targets, but we didn't want to have to modify the tablegen for every single one of those targets. At first, I was thinking that the simplest solution might be to just add a `-retireooo` command line argument to mca. However, when I was looking at where the `RetireOOO` flag was being saved (within `InstrDesc`) I realized that this `const` object contains some other attributes that developers may want to modify in the future.

Because of this, I felt that by making //this// patch instead of just adding the `-retireooo` flag, I'd be solving my problem while also making `CustomBehaviour` and `InstrPostProcess` more flexible / capable. If you disagree with the overall method and/or motivation of this patch, I can scrap it and instead just make a new proposal for a `-retireooo` flag. If you feel like this patch is appropriate, then I'd like some input on some issues that I'll discuss below.

(Skip reading this block of code for now.)

  /// An instruction descriptor
  struct InstrDesc {
    SmallVector<WriteDescriptor, 2> Writes; // Implicit writes are at the end.
    SmallVector<ReadDescriptor, 4> Reads;   // Implicit reads are at the end.

    // For every resource used by an instruction of this kind, this vector
    // reports the number of "consumed cycles".
    SmallVector<std::pair<uint64_t, ResourceUsage>, 4> Resources;

    // A bitmask of used hardware buffers.
    uint64_t UsedBuffers;

    // A bitmask of used processor resource units.
    uint64_t UsedProcResUnits;

    // A bitmask of implicit uses of processor resource units.
    uint64_t ImplicitlyUsedProcResUnits;

    // A bitmask of used processor resource groups.
    uint64_t UsedProcResGroups;

    unsigned MaxLatency;
    // Number of MicroOps for this instruction.
    unsigned NumMicroOps;
    // SchedClassID used to construct this InstrDesc.
    // This information is currently used by views to do fast queries on the
    // subtarget when computing the reciprocal throughput.
    unsigned SchedClassID;

    unsigned MayLoad : 1;
    unsigned MayStore : 1;
    unsigned HasSideEffects : 1;
    unsigned BeginGroup : 1;
    unsigned EndGroup : 1;
    unsigned RetireOOO : 1;

    // True if all buffered resources are in-order, and there is at least one
    // buffer which is a dispatch hazard (BufferSize = 0).
    unsigned MustIssueImmediately : 1;

    // A zero latency instruction doesn't consume any scheduler resources.
    bool isZeroLatency() const { return !MaxLatency && Resources.empty(); }

    InstrDesc() = default;
    InstrDesc(const InstrDesc &Other) = delete;
    InstrDesc &operator=(const InstrDesc &Other) = delete;
  };

The code block above is the `InstrDesc` definition. As mentioned earlier, this object is made `const` soon after its creation and so `InstrPostProcess` currently has no way to modify any of these attributes. It's reasonable to assume that if a developer wants to modify the mca-specific behaviour of an instruction, they'll want / need to modify some of these attributes.

One thing that I find potentially unsettling about allowing developers to modify this object is that the `Writes` and `Reads` vectors are built in a specific way. For example, in the `Writes` vector, the explicit defs come first, then the implicit defs, then optional defs, then variadic defs. I'm not positive whether or not this order is important during mca's simulations, but I did add a comment within the generic `modifyInstrDesc` definition that says to maintain that order.

Another issue with this patch is that there are no test cases associated with it. This is obviously not ideal and not best practice, but the purpose of the patch is for a downstream target and I'm not personally aware of any upstream targets that can immediately benefit from something in the `InstrDesc` object being modified for one of their instructions. If anybody knows of a test case that would make sense (or an instruction from an upstream target that could have one of these attributes modified to make it //more// correct within mca), I'd be happy to add it.

One of the more complex parts of this patch had to do with debug output. When mca is run with the `-debug` flag, some instruction information is printed while the `InstrDesc` objects are created. An example of this output is shown below:

  		Opcode Name= ADDPDrr
  		SchedClassID=24
  		Opcode=408
  		Resource Mask=0x00000000000008, Reserved=0, #Units=1, cy=1
  		Buffer Mask=0x00000000000800
  		 Used Units=0x00000000000008
  		Implicitly Used Units=0x00000000000000
  		Used Groups=0x00000000000e80
  		[Def]    OpIdx=0, Latency=3, WriteResourceID=0
  		[Use]    OpIdx=1, UseIndex=0
  		[Use]    OpIdx=2, UseIndex=1
  		[Use][I] OpIdx=0, UseIndex=2, RegisterID=MXCSR
  		MaxLatency=3
  		NumMicroOps=1

>From the perspective of the user, this output all happens at once. However, within the code, this output is printed in slightly different places. For example, the  instruction's defs are printed within the `InstrBuilder::populateWrites()` method while the instruction's resource information is printed within the `initializeUsedResources()` static function (also in `InstrBuilder.cpp`).

All of this information is printed before the `InstrDesc` has been fully created. The `IPP::modifyInstrDesc` method isn't injected until after the `InstrDesc` has been fully created, therefore, if `modifyInstrDesc` actually modifies an instruction, the printed information may no longer be accurate.

To account for this, `modifyInstrDesc` returns a bool which is `true` if and only if the instruction has been modified. If this method returns true, then we print a message stating that the instruction has been modified and then we re-print all of this information. Because re-printing this information would require the same `LLVM_DEBUG` code, I took this code and made functions for each of them that are then called from their respective places. Also, for these functions to be able to tell what type of def/use a particular write/read is, I had to add the `IsVariadicDef` and `IsVariadicUse` attributes to the `WriteDescriptor` and `ReadDescriptor` structs.

This above paragraph is something that I'm not sure I did properly (best practice wise). As far as I understand, the `LLVM_DEBUG()` blocks are code that will be omitted when building llvm in `Release` mode. I surrounded the bodies of these new functions with the `LLVM_DEBUG` blocks and I surrounded the calls to these new functions with the `LLVM_DEBUG` blocks, but I did not surround the actual definitions of these functions. Should I?

For example, should:

  static void debugPrintExplicitWrite(const WriteDescriptor &Write) {
    assert(!Write.isImplicitWrite() && !Write.IsVariadicDef &&
           "This function should only be called for non-variadic explicit "
           "writes."); // Negative integer for implicit writes

    LLVM_DEBUG({
      dbgs() << "\t\t[Def]    OpIdx=" << Write.OpIndex
             << ", Latency=" << Write.Latency
             << ", WriteResourceID=" << Write.SClassOrWriteResourceID << '\n';
    });
  }

be:

  LLVM_DEBUG({
    static void debugPrintExplicitWrite(const WriteDescriptor &Write) {
      assert(!Write.isImplicitWrite() && !Write.IsVariadicDef &&
             "This function should only be called for non-variadic explicit "
             "writes."); // Negative integer for implicit writes

      dbgs() << "\t\t[Def]    OpIdx=" << Write.OpIndex
             << ", Latency=" << Write.Latency
             << ", WriteResourceID=" << Write.SClassOrWriteResourceID << '\n';
    }
  });

or something else entirely?

Here is an example of one of these instruction's being called:

  Write.IsOptionalDef = false;
  Write.IsVariadicDef = false;

  LLVM_DEBUG(debugPrintExplicitWrite(Write));

  CurrentDef++;

As always, thank you so much for your time and interest. I would appreciate any and all feedback / questions / suggestions / criticisms.

PS. While I was moving the debug output into their own functions, I added a line for printing the opcode number of the instruction (below the OpcodeName and SchedClassID). I often find myself needing to know the opcode of an instruction and it can be a bit of a hassle to find it out in other ways so this additional line seems useful, intuitive, and not too disruptive. If this is at all controversial, I can take it out of this patch and maybe make a post for it by itself to be discussed specifically. Let me know if you think it should be removed.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D117451

Files:
  llvm/include/llvm/MCA/CustomBehaviour.h
  llvm/include/llvm/MCA/InstrBuilder.h
  llvm/include/llvm/MCA/Instruction.h
  llvm/lib/MCA/InstrBuilder.cpp
  llvm/lib/Target/X86/MCA/X86CustomBehaviour.cpp
  llvm/lib/Target/X86/MCA/X86CustomBehaviour.h
  llvm/tools/llvm-mca/llvm-mca.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D117451.400429.patch
Type: text/x-patch
Size: 18764 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220117/4a14ab46/attachment.bin>