[PATCH] D37170: [TargetTransformInfo] Add a new public interface getInstructionCost

Wed Sep 6 18:10:52 PDT 2017

hfinkel accepted this revision.
hfinkel added a comment.
This revision is now accepted and ready to land.

LGTM

See below, however, because we might want to document that the "code size" model could mean number of uops. On Xeon x86, for example, that's what the unroller wants to measure (at least for small loops). We may at some point want to split this into a uops and an actual code size, but we don't have an in-tree use case for that right now.

I should mention, however, that there will be some issues with the "code size" model, because right now, it's not strictly measuring code size. Right now, it's mostly setup to return things based TCC_Free/TCC_Basic/TCC_Expensive. Right now this is going to be fine, essentially, because TCC_Free is 0 and TCC_Basic is 1. TCC_Expensive is 4, however, which does not make sense as a value. Some auditing reveals:

1. The default code in include/llvm/Analysis/TargetTransformInfoImpl.h has:

  case Instruction::FDiv:
  case Instruction::FRem:
  case Instruction::SDiv:
  case Instruction::SRem:
  case Instruction::UDiv:
  case Instruction::URem:
    return TTI::TCC_Expensive;

and that does not make sense for code size. Moreover, as a "expense value", it's too low for division nearly everywhere (the reciprocal throughput and latency for division is generally closer to 10-20, not 4).  However, we can't decrease this value until we stop using these values in the loop unroller to decide on unrolling factors (on x86, we're unrolling in many cases to fill the LSD's uop cache, so we don't want to unroll too much there). Unless, we say that code size might be in terms of number of uops, not just the size of the encoding. The other places we return TCC_Expensive is for function calls (or think we might turn into function calls). For these, I suppose 4 is an okay code-size estimate, if we needed to pick a random number.

2. This is code in lib/Analysis/InlineCost.cpp that does this:

  // If the instruction is floating point, and the target says this operation
  // is expensive or the function has the "use-soft-float" attribute, this may
  // eventually become a library call. Treat the cost as such.
  if (I->getType()->isFloatingPointTy()) {
    // If the function has the "use-soft-float" attribute, mark it as
    // expensive.
    if (TTI.getFPOpCost(I->getType()) == TargetTransformInfo::TCC_Expensive ||
        (F.getFnAttribute("use-soft-float").getValueAsString() == "true"))
      Cost += InlineConstants::CallPenalty;
  }

This code that does an equality check against TCC_Expensive should probably have a >= instead.

And there are a number of other minor things (i.e. the speculation thresholds in SimplifyCFG)  we'll run into as we clean this up.

However, just adding this interface won't change the current behavior of anything, so I'm fine with moving ahead with this for now.

================
Comment at: include/llvm/Analysis/TargetTransformInfo.h:881
+  /// \brief Estimate the latency of specified instruction.
+  int getInstructionLatency(const Instruction *I) const;
+
----------------
What does this return if the result is unknown? This should also be documented.

https://reviews.llvm.org/D37170