[llvm-commits] [test-suite] r160413 - in /test-suite/trunk/SingleSource/Benchmarks/Misc: matmul_f64_4x4.c matmul_f64_4x4.reference_output

Wed Jul 18 13:40:34 PDT 2012

On Wed, Jul 18, 2012 at 1:20 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk>wrote:

>
> On Jul 18, 2012, at 1:12 PM, Andrew Trick <atrick at apple.com> wrote:
>
> On Jul 17, 2012, at 5:23 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk>
> wrote:
>
> +/* Allow mul4 to be inlined into wrap_mul4. This actually enables further
> + * optimizations. */
> +__attribute__((__noinline__))
> +void wrap_mul4(double *Out, const double A[4][4], const double B[4][4])
> +{
> +  mul4(Out, A, B);
> +}
>
>
> This is not obvious to me. Can you explain?
>
>
> First mul4() is optimized. Then it is inlined into wrap_mul4 and optimized
> again.
>
> The second pass somehow tickles SROA in a way that causes it to turn the
> whole double[16] array into an i1024.
>
> That doesn't happen without the extra wrapper function. See also
> http://llvm.org/pr13392
>

Ok, I got confused by the comment and your explanation at first, but
reading the bug: with this wrapper, an extra optimization occurs that
actually turns out to hurt ARM codegen. That's what the PR is about, making
this "extra" optimization not actual hurt ARM codegen, right?

So what is the comment in the code about? Is this extra optimization
actually helping on other platforms, making the wrapper a positive effect
on performance? Or is the comment just inaccurate?

If adding this wrapper actually improves performance (with a fixed PR13392
or on a platform where that doesn't happen), then *that* is the inliner bug
I'd like to know about. =]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120718/7d2538d4/attachment.html>