[PATCH] D79483: [CostModel] Replace getUserCost with getInstructionCost. WIP

Wed Aug 17 07:30:31 PDT 2022

RKSimon added inline comments.

================
Comment at: llvm/test/CodeGen/X86/select-optimize.ll:395
+; CHECK:       select.end:
+; CHECK-NEXT:    [[SUB:%.*]] = phi double [ [[TMP1]], [[FOR_BODY]] ], [ 0.000000e+00, [[SELECT_FALSE]] ]
 ; CHECK-NEXT:    [[X_ADDR_1]] = fsub double [[X_ADDR_010]], [[SUB]]
----------------
apostolakis wrote:
> RKSimon wrote:
> > @apostolakis Please would take a look at this - we're about to start work on improving the cost numbers for latency/size and just this initial cleanup is causing this test to fail - which suggests you're relying on some very inaccurate costs.
> I will remove this test. This particular test was a bit flaky to begin with given that it depended on the instruction latency of the instructions. At least it served the purpose of notifying me of the changes in the underlying cost modeling. 
> 
> Just to clarify the use in the select-optimize pass. The goal was to approximate TargetSchedModel::computeInstrLatency and compute the length of dependence chains. getInstructionCost seemed the best approximation based on the documentation. getUserCost had an unclear purpose to me, did not account for latency queries (I guess this will change now) and takes the operands as input which is worrisome since I just wanted the cost of the instructions on their own regardless of their operands (in practice the operands are mostly ignored so that might not be an issue). The biggest change in cost modeling at least from switching from getInstructionCost to getUserCost seem to be the function calls which might be for the better.
> 
> 
Thanks - if you're wanting to perform this with IR then getUserCost (-> getInstructionCost) will be your best bet - but it does rely on us getting all the costs to be more accurate, which will take time.

I'm intending to fix the bitrot (we changed the way cost-model values are reported) on the script from D103695 which allows us to compare TTI costs vs various scheduler models (via llvm-mca), so eventually most instructions/intrinsics should have values similar to TargetSchedModel::computeInstrLatency. The script just helped check for throughput mismatches, but I did have the other cost kinds in mind as well.

I'd recommend that you do take operands into account where possible, as at least on X86 there are some attempts to use them to match likely codegen (e.g. sign/zero-extended integers that can use smaller arithmetic ops).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79483/new/

https://reviews.llvm.org/D79483