[llvm-bugs] [Bug 27103] New: Improve NEON autovectorization?
via llvm-bugs
llvm-bugs at lists.llvm.org
Mon Mar 28 16:49:46 PDT 2016
https://llvm.org/bugs/show_bug.cgi?id=27103
Bug ID: 27103
Summary: Improve NEON autovectorization?
Product: libraries
Version: 3.8
Hardware: Other
OS: Linux
Status: NEW
Severity: normal
Priority: P
Component: Backend: ARM
Assignee: unassignedbugs at nondot.org
Reporter: tulipawn at gmail.com
CC: llvm-bugs at lists.llvm.org
Classification: Unclassified
Created attachment 16107
--> https://llvm.org/bugs/attachment.cgi?id=16107&action=edit
VFP assembly
Benchmarking matrix multiplication Rust code
(https://github.com/bluss/matrixmultiply/), we find that on Cortex-A5:
Using VFP:
test mat_mul_f32::m004 ... bench: 1,632 ns/iter (+/- 51)
test mat_mul_f32::m007 ... bench: 3,767 ns/iter (+/- 56)
test mat_mul_f32::m008 ... bench: 4,151 ns/iter (+/- 96)
test mat_mul_f32::m012 ... bench: 8,712 ns/iter (+/- 408)
Using NEON:
test mat_mul_f32::m004 ... bench: 1,588 ns/iter (+/- 89)
test mat_mul_f32::m007 ... bench: 3,307 ns/iter (+/- 94)
test mat_mul_f32::m008 ... bench: 3,056 ns/iter (+/- 62)
test mat_mul_f32::m012 ... bench: 6,197 ns/iter (+/- 181)
Starting with m>=16 the speedup finally reaches 2x. Is there room for
improvement here?
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20160328/0c844c4b/attachment.html>
More information about the llvm-bugs
mailing list