[LLVMbugs] [Bug 14659] New: LoopVectorizer: inefficient scatter/gather on gcc-loops Ex11.

Wed Dec 19 14:41:27 PST 2012

http://llvm.org/bugs/show_bug.cgi?id=14659

             Bug #: 14659
           Summary: LoopVectorizer: inefficient scatter/gather on
                    gcc-loops Ex11.
           Product: libraries
           Version: trunk
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Loop Optimizer
        AssignedTo: unassignedbugs at nondot.org
        ReportedBy: nrotem at apple.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

The code below is from the gcc-loops benchmark (from the gcc webpage). The code
below loads two consecutive memory accesses with a stride of 2.  The Loop
Vectorizer detects that this is a strided load and performs 4 gathers. 4
gathers are 4 * 8 loads!  GCC and ICC both perform two wide loads. 

I am not sure how to implement this. Maybe the vectorizer will generate a
target independent gather intrinsic, which can later be optimized ??? Maybe the
codegen should merge these loads ?  At the moment the codegen cant merge the
loads because the i*2+1 arithmetic is performed on vectors. 

-------------

example11:
/* feature: support strided accesses - the data elements
   that are to be operated upon in parallel are not consecutive - they
   are accessed with a stride > 1 (in the example, the stride is 2):  */
for (i = 0; i < N/2; i++){
  a[i] = b[2*i+1] * c[2*i+1] - b[2*i] * c[2*i];
  d[i] = b[2*i] * c[2*i+1] + b[2*i+1] * c[2*i];
}

-- 
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.