[LLVMdev] scalarrepl tuning

Wed Feb 3 10:05:27 PST 2010

In svn r95224 I modified the scalar replacement (SROA) pass to use different criteria to decide when it is likely to be profitable to split up an aggregate into its separate elements.  The commit message has a pretty decent explanation, but I wanted to give some further detail here.  I am hoping that the llvm-dev list allows messages with attachments.  If the graphs get stripped off, this won't be very interesting.

I was inspired by Jakob's approach to using SCIENCE to adjust our inlining heuristics, so I did a few experiments for SROA.  For these experiments, I ran the llvm nightly test benchmarks, including 3 different versions of SPEC and various other applications.  The first thing measured was the total size and number of elements for each aggregate encountered by SROA.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: sroa-stats.png
Type: image/png
Size: 6939 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100203/23fb1e8a/attachment.png>
-------------- next part --------------

The old SROA code would only consider the points in the lower left quadrant: size <= 128 bytes and elements <= 32.  The obvious diagonal lines made me realize that most of the large aggregates are arrays (duh!) and that it would be a good idea to look at structs and arrays separately.  So, I repeated the experiment, looking only at struct types:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sroa-structs.png
Type: image/png
Size: 5427 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100203/23fb1e8a/attachment-0001.png>
-------------- next part --------------

This confirmed my intuition that there are a reasonable number of large (in bytes) structs with a relatively small number of elements.  Those structs should be candidates for SROA.  I have not done the experiment to see how many of them we actually succeed in transforming.  The current limit of 32 elements looks to me like a reasonable cutoff point.  There are a few  points above that green line, and chances are that structs with so many elements will not be safe to SROA anyway.

The last experiment was to test my intuition that SROA will rarely apply to arrays, since it requires that they be accessed with only constant indices.  I was expecting that we could get away with setting a very low limit on the number of array elements for SROA candidates.  To test this, I compiled all the benchmarks and accumulated counts of array sizes in cases where we succeeded with SROA.  I don't have a graph for this, but I was surprised in several ways.  First there were 8377 arrays transformed by SROA.  That's a lot more than I expected.  Second, I was guessing that we would almost never split up an array with more than 4 elements, but there were 350 cases where that happened.  I decided to go with a limit of 8 elements, since that caught all but 16 cases.