[PATCH] D43769: [TTI] rename getArithmeticInstructionCost() to getUnitThroughput(); NFC

Tue Feb 27 19:07:10 PST 2018

nhaehnle added inline comments.

================
Comment at: include/llvm/Analysis/TargetTransformInfo.h:723
+  /// cases or optimizations based on those values.
+  int getUnitThroughput(
       unsigned Opcode, Type *Ty, OperandValueKind Opd1Info = OK_AnyValue,
----------------
eastig wrote:
> fhahn wrote:
> > RKSimon wrote:
> > > ABataev wrote:
> > > > spatel wrote:
> > > > > ABataev wrote:
> > > > > > spatel wrote:
> > > > > > > ABataev wrote:
> > > > > > > > Seems to me very general, the name does not show that this is the throughput for arithmetic instructions only. I think it is going to be enough just to clarify the comment.
> > > > > > > getXXXCost is the most ambiguous. getALUUnitThroughput is better?
> > > > > > > 
> > > > > > > I'd like to make it so we don't have to read the comments to distinguish this from getUserCost and getOperationCost. So if we don't change this name, then change those? 
> > > > > > > 
> > > > > > > I suppose we should change all of the throughput APIs in one shot if we're going to do this. But if there's no support, then I'll just update the code comment.
> > > > > > Still not sure. We're estimating the cost of the arithmetic LLVM instructions here in terms of throughput, not the ALU Unit throughput.
> > > > > Maybe I'm still confused. As I understand D43733, we're not estimating the LLVM instruction throughput for the target as a whole. If we were, then fmul/fadd/fsub should have a higher cost than integer add/sub because all recent x86 have greater overall throughput for scalar integer ops than FP ops because there are more scalar ALUs.
> > > > > 
> > > > > IMO, the more important part is that we change Cost -> Throughput for everything (I think) that the vectorizers are using, so we're at least different than 'user cost' or 'operation cost'. I guess people will have to read the comment anyway to distinguish the subtlety between 'getArithmeticThroughput' / 'getALUThroughput'.
> > > > > 
> > > > As I understand, we estimate throughout for the target as a whole. And because of that we assume that the cost of the integer scalar instructions is just 1 (though it may be lower). This just needs to be fixed somehow, e.g by introducing some multiplier.
> > > If we did have access to the target's scheduling model, maybe something like:
> > > 
> > > int Cost = (int) std::max(Inst->getThroughput() / Model->IssueWidth, 1.0f) ?
> > > 
> > > Although I don't think we're even close to being able to use scheduling models here yet - x86 at least has very few CPUs correctly modelled, although there does seem to be some interest from @andreadb @courbet et al. to use the more aggressively,
> > Maybe AArch64 would be a good target to test using the scheduling model info? The available and required CPU units should be modelled quite well there.
> > Still not sure. We're estimating the cost of the arithmetic LLVM instructions here in terms of throughput, not the ALU Unit throughput.
> 
> Agree with Alexey getUnitThroughput looks confusing whilst the information is requested for an operation. My understanding of 'throughput', when it is used for operations, is:
> 
> > Throughput is the number of cycles after issue that another instruction can begin execution.
> 
> In the function comments I see it is the reciprocal throughput which means:
> 
> > The reciprocal  throughput is the maximum number of instructions of the same kind that can be executed per clock cycle when the operands of each instruction are independent of the preceding instructions. The reciprocal throughput is also called issue latency.
> 
> I don't know how many people are familiar with the difference. As a result there will be incorrect uses. Another point is that reciprocal throughput can be less than 1. I think such cases can be represented by a pair of integers: number of operations, number of clocks. 
> 
> Is it important for the function to be ALU specific? Can the function be used for memory operations? 
The name of the function should really be `getUnitReciprocalThroughput`. Throughput means higher numbers = better, whereas here higher numbers = worse.

https://reviews.llvm.org/D43769