[LLVMdev] LLVM ARM VMLA instruction

Thu Dec 19 02:32:41 PST 2013

On Thu, Dec 19, 2013 at 9:28 AM, suyog sarda <sardask01 at gmail.com> wrote:

>
> I wasn't speculating. Let's take an example of a 3*3 simple matrix
> multiplication (no loops, all multiplication and additions are hard coded -
> basically all the operations are expanded
> e.g Result[0][0] = A[0][0]*B[0][0] + A[0][1]*B[1][0] + A[0][2]*B[2][0]
> and so on for all 9 elements of the result ).
>
> If i compile above code with "clang -O3 -mcpu=cortex-a8 -mfpu=vfpv3-d16"
> (only 16 floating point registers present with my arm, so specifying
> vfpv3-d16), there are 27 vmul, 18 vadd, 23 store and 30 load  ops in total.
> If same is compiled with gcc with same options there are 9 vmul, 18 vmla,
> 9 store and 20 load ops. So, its clear that extra load/store ops gets added
> with clang as it is not emitting vmla instruction. Won't this lead to
> performance degradation?
>
> I think what Tim is gently suggesting is that it would be informative to
actually run the code that clang produces vs the code that gcc produces on
some actual hardware and see if there is a performance difference and if it
is significant. Often direct experimentation is often quicker than trying
to figure out how some code ought to perform. (In almost every experiment
I've performed on trying optimizations the actual performance on hardware
has been different from the expectations I had before running the code.)
Granted, testing doesn't always show benefits in that sometimes
microbenchmarks are so simple the compiler can hide the deficiencies of
inefficient code that it can't in more complex real-world code, but it's
still a good first thing to try.

Cheers,
Dave

-- 
cheers, dave tweed__________________________
high-performance computing and machine vision expert: david.tweed at gmail.com
"while having code so boring anyone can maintain it, use Python." --
attempted insult seen on slashdot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131219/f4c7220b/attachment.html>