[llvm-bugs] [Bug 26106] Loop Vectorizer on ARM - why is the relative speedup so much worse?

Mon Jan 11 01:22:40 PST 2016

https://llvm.org/bugs/show_bug.cgi?id=26106

James Molloy <james.molloy at arm.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #5 from James Molloy <james.molloy at arm.com> ---
Hi,

I can't comment on microarchitectural details of ARM CPUs. However the
Cortex-A5 is a part that strongly favours efficiency over performance. It would
be interesting if you ran this on other ARM parts (for example a Cortex-A15).

It is obvious from the numbers that the performance hits a bottleneck/is
saturated after vectorization. Unrolling doesn't give much extra gain. This
could be due to a number of factors that I'm not keen on speculating on! This
is just the theoretical peak performance of that system (because it's not just
the CPU, it's the memory bandwidth too) on that benchmark.

I'm going to mark this resolved, but feel free to reopen it.

James

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20160111/fb6446c0/attachment-0001.html>