[llvm-commits] [test-suite] r160413 - in /test-suite/trunk/SingleSource/Benchmarks/Misc: matmul_f64_4x4.c matmul_f64_4x4.reference_output

Wed Jul 18 14:26:07 PDT 2012

On Jul 18, 2012, at 1:59 PM, Chandler Carruth <chandlerc at google.com> wrote:

> On Wed, Jul 18, 2012 at 1:47 PM, Andrew Trick <atrick at apple.com> wrote:
> 
> On Jul 18, 2012, at 1:40 PM, Chandler Carruth <chandlerc at google.com> wrote:
> 
>> On Wed, Jul 18, 2012 at 1:20 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
>> 
>> On Jul 18, 2012, at 1:12 PM, Andrew Trick <atrick at apple.com> wrote:
>> 
>>> On Jul 17, 2012, at 5:23 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
>>>> +/* Allow mul4 to be inlined into wrap_mul4. This actually enables further
>>>> + * optimizations. */
>>>> +__attribute__((__noinline__))
>>>> +void wrap_mul4(double *Out, const double A[4][4], const double B[4][4])
>>>> +{
>>>> +  mul4(Out, A, B);
>>>> +}
>>> 
>>> This is not obvious to me. Can you explain?
>> 
>> First mul4() is optimized. Then it is inlined into wrap_mul4 and optimized again.
>> 
>> The second pass somehow tickles SROA in a way that causes it to turn the whole double[16] array into an i1024.
>> 
>> That doesn't happen without the extra wrapper function. See also http://llvm.org/pr13392
>> 
>> Ok, I got confused by the comment and your explanation at first, but reading the bug: with this wrapper, an extra optimization occurs that actually turns out to hurt ARM codegen. That's what the PR is about, making this "extra" optimization not actual hurt ARM codegen, right?
>> 
>> So what is the comment in the code about? Is this extra optimization actually helping on other platforms, making the wrapper a positive effect on performance? Or is the comment just inaccurate?
>> 
>> If adding this wrapper actually improves performance (with a fixed PR13392 or on a platform where that doesn't happen), then *that* is the inliner bug I'd like to know about. =]
> 
> FWIW, I ran into exactly the same problem a few weeks back, totally different code. SROA has some interesting pass order problems. Subsequent optimizations may do store-load forwarding exposing more SROA next time 'round.
> 
> We're supposed to be handling this pretty aggressively in the CGSCC pass manager... I'd love to have some test cases that demonstrate these limitations, especially as SROA is extremely intertwined with the inliner... 

Inlining can accidentally save the day. But the problem exists even if there's no opportunity to inline. It's just that SROA wants to rerun after loop unrolling in my case. Actually, I'm not sure we want SROA, or just better GVN. I'll file a PR and copy you as soon as I get a chance to extract a test case, then let people argue about it.

-Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120718/f7277579/attachment.html>