[LLVMbugs] [Bug 14659] New: LoopVectorizer: inefficient scatter/gather on gcc-loops Ex11.
bugzilla-daemon at llvm.org
bugzilla-daemon at llvm.org
Wed Dec 19 14:41:27 PST 2012
http://llvm.org/bugs/show_bug.cgi?id=14659
Bug #: 14659
Summary: LoopVectorizer: inefficient scatter/gather on
gcc-loops Ex11.
Product: libraries
Version: trunk
Platform: PC
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P
Component: Loop Optimizer
AssignedTo: unassignedbugs at nondot.org
ReportedBy: nrotem at apple.com
CC: llvmbugs at cs.uiuc.edu
Classification: Unclassified
The code below is from the gcc-loops benchmark (from the gcc webpage). The code
below loads two consecutive memory accesses with a stride of 2. The Loop
Vectorizer detects that this is a strided load and performs 4 gathers. 4
gathers are 4 * 8 loads! GCC and ICC both perform two wide loads.
I am not sure how to implement this. Maybe the vectorizer will generate a
target independent gather intrinsic, which can later be optimized ??? Maybe the
codegen should merge these loads ? At the moment the codegen cant merge the
loads because the i*2+1 arithmetic is performed on vectors.
-------------
example11:
/* feature: support strided accesses - the data elements
that are to be operated upon in parallel are not consecutive - they
are accessed with a stride > 1 (in the example, the stride is 2): */
for (i = 0; i < N/2; i++){
a[i] = b[2*i+1] * c[2*i+1] - b[2*i] * c[2*i];
d[i] = b[2*i] * c[2*i+1] + b[2*i+1] * c[2*i];
}
--
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the llvm-bugs
mailing list