[llvm] r228265 - Implement new heuristic for complete loop unrolling.

Thu Feb 12 15:49:34 PST 2015

On Wed, Feb 4, 2015 at 6:34 PM, Michael Zolotukhin <mzolotukhin at apple.com>
wrote:

> Author: mzolotukhin
> Date: Wed Feb  4 20:34:00 2015
> New Revision: 228265
>
> URL: http://llvm.org/viewvc/llvm-project?rev=228265&view=rev
> Log:
> Implement new heuristic for complete loop unrolling.
>
> Complete loop unrolling can make some loads constant, thus enabling a
> lot of other optimizations. To catch such cases, we look for loads that
> might become constants and estimate number of instructions that would be
> simplified or become dead after substitution.
>

Sorry I couldn't follow along on the pre-commit review in detail, but I had
to take a detailed look at this code today and wanted to send my thoughts
your way.

First, it would be awesome to try to more consistently stick to the new
coding conventions when adding significant new code. Relatedly, please use
signed types for abstract numbers that you don't expect 2's complement
behavior with. For example, if we were to overflow one of these costs, we
would *not* want to wrap to a tiny cost!!! Much better to get a UBSan
failure that points at the bug.

Second, the code as it stands has serious bugs computing the "dead"
instruction cost. Let's look at EstimateNumberOfDeadInsns. This function
considers each operand of a folded instruction, and if all users of that
operand were folded, we add that instruction to the cost. But we do this
for every operand. What if a single instruction occurs as an operand to
many folded instructions? As far as I can tell, we'll add the cost for each
use. =[

Also, it doesn't seem to handle circular references. Some of these can
happen in 'normal' looking phi nodes in loops. But in unreachable code this
could show up essentially anywhere. I think this causes the pass to infloop
on certain inputs. But its hard to tell if it is an infloop, or just
incredibly bad scaling that makes it run for "ever" on the inputs.

Generally, I am *extremely* concerned with the compile time impact of this
change. Please very carefully benchmark the compile time impact of this,
and disable it until you can fix any compile time problems that turns up.
There seem to be numerous places where there are better or smaller
datastructures you could use. You call TTI methods inside of hot loops
where it might be better to cache the cost of an instruction, or better,
ensure we only compute it once. Also, walking the use-list is relatively
slow (linked list causing cache misses) so it would be good to minimize
this where possible.

Because this is in tree after considerable review, I don't really want to
revert it, but the issues I've pointed out above are causing hard failures
for us. So I'm going to fix a lot of these problems myself, but I wanted to
bring them up here so you were aware that they needed to be considered
going forward.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150212/38c27ae2/attachment.html>