[PATCH] D131930: [mlgo] Add in-development instruction based features for regalloc advisor

Mon Sep 12 17:10:46 PDT 2022

mtrofin added inline comments.

================
Comment at: llvm/lib/CodeGen/MLRegallocEvictAdvisor.cpp:149
+// fed an opcode it hasn't seen before. This constant sets the current cutoff.
+static const int OpcodeCountCutoff = 17716;
+
----------------
rename this to `OpcodeValueCutoff` - `Count` to me refers at this point to how many opcodes in the tensor, but IIUC you actually mean values larger than that are unknown to us. Same for the comment.

================
Comment at: llvm/lib/CodeGen/MLRegallocEvictAdvisor.cpp:265
+#define _RESET_DEV(TYPE, NAME, SHAPE, __)                                      \
+  std::memset(Runner.getTensorUntyped(FeatureIDs::NAME - 1), 0,                \
+              getTotalSize<TYPE>(SHAPE));
----------------
you can avoid the -1 thing by splitting your RA_EVICT_FEATURES_UNDER_DEVELOPMENT_LIST in 2, the "first" and the "rest", and then forcing the index of the first to be == `FeatureCount`

```
#ifdef LLVM_HAVE_TF_API
#define RA_FIRST_EXTRA_FEATURE(M)  \
    M(int64_t, instructions, InstructionsShape,                                  \
    "Opcodes of the instructions covered by the eviction problem") 
#define RA_REST_EXTRA_FEATURES(M)                            \
  M(int64_t, instructions_mapping, InstructionsMappingShape,                   \
    "A binary matrix mapping LRs to instruction opcodes")
#define RA_EVICT_FEATURES_UNDER_DEVELOPMENT(M) \
  RA_FIRST_EXTRA_FEATURE(M) \
  RA_REST_EXTRA_FEATURES(M)
#else
#define RA_FIRST_EXTRA_FEATURE(M)
#define   RA_REST_EXTRA_FEATURES(M)
#endif

#define RA_EVICT_FEATURES_UNDER_DEVELOPMENT(M) \
  RA_FIRST_EXTRA_FEATURE(M) \
  RA_REST_EXTRA_FEATURES(M)
enum FeatureIDs {
#define _FEATURE_IDX_SIMPLE(_, name, __, ___) name
#define _FEATURE_IDX(A,B,C,D) _FEATURE_IDX_SIMPLE(A,B,C,D),
  RA_EVICT_FEATURES_LIST(_FEATURE_IDX) FeatureCount,
  RA_FIRST_EXTRA_FEATURE(_FEATURE_IDX_SIMPLE) = FeatureCount,
  RA_REST_EXTRA_FEATURES(_FEATURE_IDX)
      FeaturesWithDevelopmentCount
#undef _FEATURE_IDX
};

```

================
Comment at: llvm/lib/CodeGen/MLRegallocEvictAdvisor.cpp:286
+struct LRStartEndInfo {
+  SlotIndex Begin;
+  SlotIndex End;
----------------
Initialize things that don't have default ctors easy to do at def, and avoids nondeterministic bugs. Like Pos, at minimum, set it to 0.

================
Comment at: llvm/lib/CodeGen/MLRegallocEvictAdvisor.cpp:288
+  SlotIndex End;
+  size_t Pos;
+};
----------------
add a comment as to what `Pos` is - it's the index of the column corresponding to the physical register of the live interval segment captured by this LRStartEndInfo, right? (or the candidate)

also a comment for LRStartEndInfo overall

================
Comment at: llvm/lib/CodeGen/MLRegallocEvictAdvisor.cpp:932
+
+void MLEvictAdvisor::extractInstructionFeatures(
+    SmallVectorImpl<LRStartEndInfo> &LRPosInfo) const {
----------------
This feels like it'd benefit from a unittest. The core logic is about intervals that cover instruction opcodes, so the only LLVM-ness of it is in `LIS->getInstructionFromIndex(CurrentIndex)` (because `LIS->getSlotIndexes()->getLastIndex())` is basically a constant you can pass in)

You can test it by making a small change: make this into a utility that takes LRPosInfo and a std::function (or a llvm::function_ref, whatever) that gives the opcode for a SlotIndex. The utility doesn't need to know about mlevictadvisor or anything - it's just dealing with intervals. You can also pass a Runner in, and in the test case, just use the NoInferenceModelRunner.

So then you can set up all sorts of interesting interval overlapping cases, and you can just populate your opcodes with incrementing numbers or something.

The nice thing is that the unittest doesn't need any #define specific stuff - it's generic.

The rest can stay the same - meaning, let `MLEvictAdvisor::extractInstructionFeatures` call that utility, etc.

================
Comment at: llvm/lib/CodeGen/MLRegallocEvictAdvisor.cpp:937
+  // tensors.
+  // 1 - A vector of opcodes in sequential order of size (max instruction count)
+  // 2 - A binary mapping matrix of size (LR count * max instruction count)
----------------
(nit - it took me a few reads to get it) "A vector of size "max instruction count". It contains the instruction opcodes of instructions covered by all intervals in LRPosInfo"

wdyt?

================
Comment at: llvm/lib/CodeGen/MLRegallocEvictAdvisor.cpp:946
+      [](LRStartEndInfo A, LRStartEndInfo B) { return A.Begin < B.Begin; });
+  size_t InstructionCount = 0;
+  size_t CurrentSegment = 0;
----------------
This is rather the Instruction "Index" or something like that, right? Count to me means total.

================
Comment at: llvm/lib/CodeGen/MLRegallocEvictAdvisor.cpp:947
+  size_t InstructionCount = 0;
+  size_t CurrentSegment = 0;
+  SlotIndex CurrentIndex = LRPosInfo[0].Begin;
----------------
Nit: CurrentSegmentIdx

================
Comment at: llvm/lib/CodeGen/MLRegallocEvictAdvisor.cpp:953
+  // overlaps with segments that start at greater slot indices). After hitting
+  // that end index, the current segment being process gets bumped until they
+  // are all processed or the max instruction count is hit, where everything is
----------------
typo: process"ed"

================
Comment at: llvm/lib/CodeGen/MLRegallocEvictAdvisor.cpp:976
+      Runner->getTensor<int64_t>(FeatureIDs::instructions -
+                                 1)[InstructionCount] =
+          CurrentOpcode < OpcodeCountCutoff ? CurrentOpcode : 0;
----------------
you don't need the -1 thing here if you split the extra list in 2

================
Comment at: llvm/lib/CodeGen/MLRegallocEvictAdvisor.cpp:991
+      size_t OverlapCheckCurrentSegment = CurrentSegment + 1;
+      while (OverlapCheckCurrentSegment < LRPosInfo.size()) {
+        // If the beginning of the segment being currently checked is greater
----------------
the exit condition could include LRPosInfo[OverlapCheckCurrentSegment].Begin > CurrentIndex, for more clarity?

================
Comment at: llvm/lib/CodeGen/MLRegallocEvictAdvisor.cpp:1001
+            LRPosInfo[OverlapCheckCurrentSegment].Pos;
+        Runner->getTensor<int64_t>(FeatureIDs::instructions_mapping -
+                                   1)[OverlapCurrentSegmentPosition *
----------------
can't `LRPosInfo[OverlapCheckCurrentSegment].End < CurrentIndex`? So we know the CurrentIndex is below Begin, but it could also be below this segment's end?

Current instruction: 5
CurrentSegment: [1,7)
NextSegment: [2,4)
NextNextSegment: [2, 8)

================
Comment at: llvm/lib/CodeGen/MLRegallocEvictAdvisor.cpp:1017
+    // just finished processing the current segment, transition to the next one
+    if (LRPosInfo[CurrentSegment + 1].Begin <= LRPosInfo[CurrentSegment].End) {
+      // segments are overlapping.
----------------
you just need to go to the next segment and potentially set CurrentIndex to that one's beginning if it's after CurrentIndex (simpler code)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131930/new/

https://reviews.llvm.org/D131930