[llvm-commits] PATCH: Fix or PR13392 and many other slow-opt-compile bugs in LLVM

Chandler Carruth chandlerc at gmail.com
Sat Jul 21 20:18:03 PDT 2012


Hello folks,

Based on a few reports, I've been tracking down some extremely slow
compiles of small, reasonable code snippets, and it turns out that most of
them look exactly like PR13392. Not only does creating i1024 and i2048
variables everywhere in SROA confuse the daylights out of the ARM codegen,
it also makes lots of the IR and DAG optimizers slower because it slows
down ComputeDemandedBits and other APInt operations on these values.

To re-cap from the bug, SROA sees something like:

void f(...) {
  double data[16];
  ... lots of math ....
}

And it turns these 16 x 8-byte alloca into a single i1024 value. =[

There is a very direct solution to this, we can enhance the part of SROA
that converts an aggregate alloca into an alloca of a single integer (or a
vector) when that integer type is a valid type for the target.
Unfortunately, this essentially turns off SROA for most arrays. The
performance implications are quite bad.


The reason why normal SROA doesn't kick in here is pretty straight forward
as well -- we have a set of thresholds that limit how large of entities
SROA will process. These take the form of an element-wise limit and a size
limit. Many of the cases which the above SROA is "handling" are large
enough to exceed any of these limits, so I dug into the fundamental reason
why the limits existed: http://llvm.org/PR1446

It turns out that if you take that test case, update it a bit and run it
through todays LLVM pass, it is optimized very efficiently. But it does
explode the IR from 1 instruction (memcpy) to 1k instructions when it hits
a large array in use with a memcpy.

This is the fundamental thing that seems important to preserve in limiting
SROA: we don't want to have the growth of IR due to SROA be a factor of the
size of the aggregate, we want it to be a factor of the size of the
existing IR *using* that aggregate. This is a much more targeted threshold.

I've attached a very rough patch that seems to make this switch. It does
three things:

1) Remove the default thresholds. They remain available although I question
their utility....
2) Add a requirement to the logic that converts an aggregate alloca to a
single integer alloca that unless there is a vector load/store involved,
the bitwidth must fit in a legal integer for the target.
3) Add a new check to normal SROA which checks whether an element-wise
access stems from a memcpy and touches an element that could not itself be
converted to a vector or integer alloca.

With #3, we should still allow forming vector loads & stores, and large
sub-aggregate object loads & stores, as those don't bloat the IR (slowing
down compiles) and should already be lowered efficiently in the backend.
This helps ensure we still decompose aggregates as aggressively as possible
in SROA.


I've run these changes through the nightly test suite, and so far the
numbers are good. My machine is sadly quite noisy, but investigating the
swings in execution times, the only test which slowed down significantly
(more than 5%) seems to be the very test case in PR13392. The generated IR
is *much* better now, but the register allocator makes some bad decisions
and we end up with about 4x the amount of time stalled in the frontend of
the CPU.

The compile times for the test case in PR13392 and for a sha1
implementation reported on the mailing list, are *greatly* improved -- 2x
faster in some cases.


I'm still updating the regression tests which were lacking legal integer
sizes and/or were relying on illegal integer sizes in the output, but I
think the code essentially "works". I'll also be doing more extensive
performance testing. =]

-Chandler
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120721/bd3b8dbf/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sroa-large-int-fix.patch
Type: application/octet-stream
Size: 5258 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120721/bd3b8dbf/attachment.obj>


More information about the llvm-commits mailing list