[PATCH] D35584: [CGP] Fold empty dedicated exit blocks created by loopsimplify.

Mon Jul 31 08:37:34 PDT 2017

bmakam added inline comments.

================
Comment at: test/Transforms/CodeGenPrepare/merge-empty-latch-block.ll:91
+while.body.backedge.loopexit:                     ; preds = %while.cond5
+  br label %while.body.backedge
+
----------------
efriedma wrote:
> If I'm following correctly, the problem is this block: you want it to go away, but cgp isn't folding it.
> 
> It looks like isMergingEmptyBlockProfitable is specifically trying to detect cases like this: folding away this BB involves inserting an extra COPY into the while.cond5, and while.cond5 is hotter than while.body.backedge.loopexit, so in theory you could lose performance.
> 
> In this particular situation, though, you want to fold it anyway?  What distinguishes this testcase from the testcase in r289988?
Yes, your understanding is correct. This block was created by loopsimplify during LSR pass to canonicalize the loop such that it has dedicated exit blocks with the understanding that simplifyCFG will clean up blocks which are split out but end up being unnecessary. Since we do not run simplifyCFG after LSR, we rely on CGP to fold this block so that generated code is not pessimized.

If I understand correctly, isMergingEmptyBlockProfitable is trying to workaround the underlying problem i.e. new critical edges cannot be split properly in PhiElimination which results in COPY instructions be inserted into blocks with higher frequency. I'm not sure if GlobalISel can handle this issue, but the temporary solution in r289988 cannot be applied in general as I observed that after r308422, if we do not fold away empty exit blocks it pessimizes the generated code and resulted in a 3% regression in the same benchmark that was initially targeted with r289988.

My first solution was to fold the empty block if it were only a latch block and this avoided the regression caused by r308422 and also kept the gains due to r289988. However, I feel this was papering over the real issue, so I am now folding all the exit blocks because they were likely added by loopsimplify and need to be cleaned up. This recovered 0.7% of the lost regression. I am looking for feedback on what could be a reasonable approach and open for suggestions.

https://reviews.llvm.org/D35584