[PATCH] D13417: [MachineCombiner] make slack optional in critical path cost calculation (PR25016)
Sanjay Patel via llvm-commits
llvm-commits at lists.llvm.org
Tue Oct 6 11:32:47 PDT 2015
spatel added a comment.
In http://reviews.llvm.org/D13417#260860, @Gerolf wrote:
> We should avoid scenarios where two types of machine combines suddenly would require a different heuristic within the same block, eg assume a madd combine and a reassoc in the same block.
Do you have a suggestion for how to avoid that? I'm still mentally blocked on the need for the more liberal MADD cost calc because I haven't seen an example where that provides a win.
> Also, on a minor note I don’t think I truly understand your underpinning performance issue yet.
Let's try to fix that then - without that understanding, this must look like a useless exercise. :)
First, ignore any information about spilling - that's not what we're concerned about in this patch.
The 2nd test case in this patch shows how things go wrong. It's similar to the example I posted on the dev list:
define double @foo4_reassociated() {
%a = call double @bar()
%b = call double @bar()
%c = call double @bar()
%d = call double @bar()
%t0 = fadd double %a, %b <--- 1st independent add
%t1 = fadd double %c, %d <--- 2nd independent add
%t2 = fadd double %t0, %t1 <--- 3rd add depends on previous adds, so critical path is 2 fadds
ret double %t2
}
The MachineCombiner is *increasing* the critical path by reassociating the operands:
callq bar
vmovsd %xmm0, 16(%rsp)
callq bar
vmovsd %xmm0, 8(%rsp)
callq bar
vmovsd %xmm0, (%rsp)
callq bar
vaddsd (%rsp), %xmm0, %xmm0 <--- independent add
vaddsd 8(%rsp), %xmm0, %xmm0 <--- dependent add
vaddsd 16(%rsp), %xmm0, %xmm0 <--- another dependent add, so critical path is 3 fadds
(a + b) + (c + d) --> ((d + c) + b) + a
This happens because the slack on the last add in the original trace is computed as 10 cycles (!):
DEPENDENCE DATA FOR 0x7feecc800dc0
NewRootDepth: 7 NewRootLatency: 3
RootDepth: 4 RootLatency: 3 RootSlack: 10
NewRootDepth + NewRootLatency = 10
RootDepth + RootLatency + RootSlack = 17
On 2nd thought, that slack looks really wrong, so maybe the right fix for this problem actually is in MachineTraceMetrics?
http://reviews.llvm.org/D13417
More information about the llvm-commits
mailing list