[llvm-bugs] [Bug 27103] New: Improve NEON autovectorization?

via llvm-bugs llvm-bugs at lists.llvm.org
Mon Mar 28 16:49:46 PDT 2016


https://llvm.org/bugs/show_bug.cgi?id=27103

            Bug ID: 27103
           Summary: Improve NEON autovectorization?
           Product: libraries
           Version: 3.8
          Hardware: Other
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: Backend: ARM
          Assignee: unassignedbugs at nondot.org
          Reporter: tulipawn at gmail.com
                CC: llvm-bugs at lists.llvm.org
    Classification: Unclassified

Created attachment 16107
  --> https://llvm.org/bugs/attachment.cgi?id=16107&action=edit
VFP assembly

Benchmarking matrix multiplication Rust code
(https://github.com/bluss/matrixmultiply/), we find that on Cortex-A5:


Using VFP:

test mat_mul_f32::m004 ... bench:       1,632 ns/iter (+/- 51)
test mat_mul_f32::m007 ... bench:       3,767 ns/iter (+/- 56)
test mat_mul_f32::m008 ... bench:       4,151 ns/iter (+/- 96)
test mat_mul_f32::m012 ... bench:       8,712 ns/iter (+/- 408)

Using NEON:

test mat_mul_f32::m004 ... bench:       1,588 ns/iter (+/- 89)
test mat_mul_f32::m007 ... bench:       3,307 ns/iter (+/- 94)
test mat_mul_f32::m008 ... bench:       3,056 ns/iter (+/- 62)
test mat_mul_f32::m012 ... bench:       6,197 ns/iter (+/- 181)


Starting with m>=16 the speedup finally reaches 2x. Is there room for
improvement here?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20160328/0c844c4b/attachment.html>


More information about the llvm-bugs mailing list