[PATCH] D13417: [MachineCombiner] make slack optional in critical path cost calculation (PR25016)

Sanjay Patel via llvm-commits llvm-commits at lists.llvm.org
Wed Nov 4 09:58:05 PST 2015


spatel added a comment.

In http://reviews.llvm.org/D13417#280784, @Gerolf wrote:

> I think we need your patch. It would be nice to have also a test case that shows an actual increase 
>  in critical path length and a measurable performance difference. In the test cases so far the vadd 
>  sequences take longer w/o actually increasing the critical path.


Hmm...I'm not seeing a way to make that happen. Any suggestions on how to expose this?

We should never increase the calculated critical path (which includes slack). We know from the example that uses 'call' instructions that our CPU model may be incorrect, and this may lead to a real-world increase in CPL. That is the scenario I'm hoping to avoid because we know that a CPU model in the compiler will never be a 100% accurate simulator for any sufficiently complex CPU.

> So just by looking at a reassociation case in isolation I wouldn’t worry about modifying the heuristic. 

>  However, in a scenario eg. with reassoc first and then a mul+add -> madd the bigger depth of the 

>  re-associated instruction could result in not performing the mul+ add combine later. It could be that

>  the extra depth may have eaten the slack the madd needs to fit within the critical path.


I'm sorry, but I'm still not following this argument. Let's say that we don't even have a reassoc transform here, but the instructions themselves are already reassociated optimally. In that case, if a muladd transform that would be beneficial does not fire, then it must be a deficiency in the muladd logic or MTM, right? Ie, the cost calc for doing that transform should be independent of whether the code was transformed here or earlier by some other mechanism. We guarantee this by invalidating the MTM data between transforms here in MachineCombiner.

> Also, since reassoc attempts to reduce depth shouldn’t the equation ignore latencies and 

>  the code finally look something like if (MustReduceDepth) return NewDepth < OldDepth else 

>  if (MustReduceCPL) return NewCycleCount < OldCycleCount else return 

>  NewCycleCount <= OldCycleCount?


Yes, this looks correct. I think this is almost identical to the logic implemented in this patch, but we can eliminate the extra calculation of 'RootLatency' and 'NewRootLatency' because we know they are always identical for reassociation. Let me fix this and update the patch.

> Thanks for effort and being persistent.


Thank you! I think I have a much better understanding of MTM now. :)


http://reviews.llvm.org/D13417





More information about the llvm-commits mailing list