[PATCH] D13417: [MachineCombiner] make slack optional in critical path cost calculation (PR25016)

Sanjay Patel via llvm-commits llvm-commits at lists.llvm.org
Tue Oct 6 11:32:47 PDT 2015


spatel added a comment.

In http://reviews.llvm.org/D13417#260860, @Gerolf wrote:

> We should avoid scenarios where two types of machine combines suddenly would require a different heuristic within the same block, eg assume a madd combine and a reassoc in the same block.


Do you have a suggestion for how to avoid that? I'm still mentally blocked on the need for the more liberal MADD cost calc because I haven't seen an example where that provides a win.

> Also, on a minor note I don’t think I truly understand your underpinning performance issue yet.


Let's try to fix that then - without that understanding, this must look like a useless exercise. :)
First, ignore any information about spilling - that's not what we're concerned about in this patch.

The 2nd test case in this patch shows how things go wrong. It's similar to the example I posted on the dev list:

  define double @foo4_reassociated() {
    %a = call double @bar()
    %b = call double @bar()
    %c = call double @bar()
    %d = call double @bar()
    %t0 = fadd double %a, %b   <--- 1st independent add
    %t1 = fadd double %c, %d   <--- 2nd independent add
    %t2 = fadd double %t0, %t1 <--- 3rd add depends on previous adds, so critical path is 2 fadds
    ret double %t2
  }

The MachineCombiner is *increasing* the critical path by reassociating the operands:

  callq     bar
  vmovsd    %xmm0, 16(%rsp)    
  callq     bar
  vmovsd    %xmm0, 8(%rsp)       
  callq     bar
  vmovsd    %xmm0, (%rsp)         
  callq     bar
  vaddsd    (%rsp), %xmm0, %xmm0    <--- independent add
  vaddsd    8(%rsp), %xmm0, %xmm0   <--- dependent add
  vaddsd    16(%rsp), %xmm0, %xmm0  <--- another dependent add, so critical path is 3 fadds

(a + b) + (c + d) --> ((d + c) + b) + a

This happens because the slack on the last add in the original trace is computed as 10 cycles (!):

  DEPENDENCE DATA FOR 0x7feecc800dc0
    NewRootDepth: 7 NewRootLatency: 3
    RootDepth: 4 RootLatency: 3 RootSlack: 10
    NewRootDepth + NewRootLatency = 10
    RootDepth + RootLatency + RootSlack = 17

On 2nd thought, that slack looks really wrong, so maybe the right fix for this problem actually is in MachineTraceMetrics?


http://reviews.llvm.org/D13417





More information about the llvm-commits mailing list