[PATCH] D141393: [llvm][ir] Purge MD_prof custom accessors

Tue Jan 10 18:22:24 PST 2023

paulkirth requested changes to this revision.
paulkirth added a comment.
This revision now requires changes to proceed.

Thanks for picking this up. When I implemented `ProfDataUtils` there were still several things I wanted to address but havent' had sufficient bandwidth to pick up again. Looks like you've got most of the ones that I recall so far.

I left some feedback inline. Overall I think this is a good change. I have a few nits, and concerns that we may be missing some important checks, but most are pretty minor.

The one thing I'd say that definitely needs to change IMO is the addition of the SwitchInst to the True/False `extractBranchWeights` API.  It seems error prone and wrong to use in almost all situations. But I'm not opposed to the change if there is a compelling reason, but I still don't see much benefit to it w/in this patch, except for one place in `Local.cpp`.

================
Comment at: llvm/include/llvm/IR/ProfDataUtils.h:62

-/// Extract branch weights from a conditional branch or select Instruction.
+/// Extract branch weights from a conditional branch, select, or switch
+/// Instruction.
----------------
This is incorrect. This overload is specifically for branch/select instructions where the number of successors is 2. There is a more general version that uses weights vector for all instruction types, including switches.

If this API isn't needed anywhere/in the code base anymore, then it can be removed IMO, but this comment shouldn't change.

================
Comment at: llvm/lib/IR/Instructions.cpp:4606-4609
-  if (ProfileData->getNumOperands() != SI.getNumSuccessors() + 1) {
-    llvm_unreachable("number of prof branch_weights metadata operands does "
-                     "not correspond to number of succesors");
-  }
----------------
I don't think we do this check in ProfDatUtils, so we probably shouldn't remove it, should we? I know I've tripped this assert before when modifying some passes, so I think it's good one to keep.

================
Comment at: llvm/lib/IR/ProfDataUtils.cpp:46
 // the minimum number of operands for MD_prof nodes with branch weights
-constexpr unsigned MinBWOps = 3;
+constexpr unsigned MinBWOps = 2;

----------------
This constant is used to check the validity of branch weight metadata. Changing this to 2  implies that a MD_prof node w/ only the "branch_weight" moniker and a single weight are valid. I don't think that's true, so I don't think this should change unless we change how MD_prof is laid out as a whole.

================
Comment at: llvm/lib/IR/ProfDataUtils.cpp:128
+          I.getOpcode() == Instruction::Select ||
+          I.getOpcode() == Instruction::Switch) &&
+         "Looking for branch weights on something besides branch, select, or "
----------------
This is wrong. you can't include a switch here. The overload above takes a `SmallVectorImpl` is the only appropriate API when dealing with a switch, since it can have an arbitrary number of arms.

The assertion here is to prevent API misuse. If used on a switch, this will always return false, so I don't think thats the right choice here.  You get some utility by being able to use this in  more places, but I think this will be easy to introduce subtle bugs. For instance, if a switch happens to only have 2 arms this will work when it arguably shouldn't. 

My preference would be that everything settle on the `SmallVectorImpl` API, since that can correctly handle all cases. It isn't' as ergonomic in some of the code uses though, IIRC.

================
Comment at: llvm/lib/Transforms/IPO/PartialInlining.cpp:719
       continue;
-    uint64_t T, F;
-    if (extractBranchWeights(*BR, T, F))
+    if (getBranchWeightMDNode(*BR))
       return true;
----------------

================
Comment at: llvm/lib/Transforms/Instrumentation/ControlHeightReduction.cpp:581
 // MD_prof metadata if it's well-formed.
-static bool checkMDProf(MDNode *MD, BranchProbability &TrueProb,
+static bool checkMDProf(Instruction *I, BranchProbability &TrueProb,
                         BranchProbability &FalseProb) {
----------------
Oh, this is a good change. The API use is much better at call sites. 

Since the validation is basically done in ProfDataUtils, maybe rename this API to something more self descriptive?

================
Comment at: llvm/lib/Transforms/Instrumentation/ControlHeightReduction.cpp:587
     return false;
-  ConstantInt *TrueWeight = mdconst::extract<ConstantInt>(MD->getOperand(1));
-  ConstantInt *FalseWeight = mdconst::extract<ConstantInt>(MD->getOperand(2));
-  if (!TrueWeight || !FalseWeight)
-    return false;
-  uint64_t TrueWt = TrueWeight->getValue().getZExtValue();
-  uint64_t FalseWt = FalseWeight->getValue().getZExtValue();
-  uint64_t SumWt = TrueWt + FalseWt;
+  uint64_t SumWt = TrueWeight + FalseWeight;

----------------
nit: if we're changing the variable names, let's be consistent. I'd leave them as they were to limit the change, but if we're changing some, we should change them uniformly.

================
Comment at: llvm/lib/Transforms/Scalar/JumpThreading.cpp:2526

-  MDNode *WeightsNode = TI->getMetadata(LLVMContext::MD_prof);
+  auto *WeightsNode = getBranchWeightMDNode(*TI);
   if (!WeightsNode)
----------------
I think we discourage `auto` in cases like this, don't we?

See: https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable

================
Comment at: llvm/lib/Transforms/Scalar/LoopPredication.cpp:978-981
   auto IsValidProfileData = [](MDNode *ProfileData, const Instruction *Term) {
-    if (!ProfileData || !ProfileData->getOperand(0))
-      return false;
-    if (MDString *MDS = dyn_cast<MDString>(ProfileData->getOperand(0)))
-      if (!MDS->getString().equals("branch_weights"))
-        return false;
-    if (ProfileData->getNumOperands() != 1 + Term->getNumSuccessors())
-      return false;
-    return true;
+    return ProfileData &&
+           ProfileData->getNumOperands() == 1 + Term->getNumSuccessors();
   };
----------------
Should we also provide this API in `ProfDataUtils`? It's a validity check for branch weight nodes, and we already encapsulate most of it ...

================
Comment at: llvm/lib/Transforms/Utils/Local.cpp:221
           SmallVector<uint32_t, 8> Weights;
-          for (unsigned MD_i = 1, MD_e = MD->getNumOperands(); MD_i < MD_e;
-               ++MD_i) {
-            auto *CI = mdconst::extract<ConstantInt>(MD->getOperand(MD_i));
-            Weights.push_back(CI->getValue().getZExtValue());
-          }
+          extractBranchWeights(*SI, Weights);
+
----------------

================
Comment at: llvm/lib/Transforms/Utils/Local.cpp:315-317
+      uint64_t DefWeight;
+      uint64_t CaseWeight;
+      if (extractBranchWeights(*SI, DefWeight, CaseWeight)) {
----------------
The overload will just get the MD node again anyway before calling the other API, so I'd suggest just using the MD node directly. Would you mind changing these and any others in this patch that use `getBranchWeightMDNode` prior to `extractBranchWeights`

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141393/new/

https://reviews.llvm.org/D141393