[LLVMbugs] [Bug 14522] New: Reload from load invariant address is not hoisted out of the loop

Wed Dec 5 18:27:03 PST 2012

http://llvm.org/bugs/show_bug.cgi?id=14522

             Bug #: 14522
           Summary: Reload from load invariant address is not hoisted out
                    of the loop
           Product: libraries
           Version: trunk
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Scalar Optimizations
        AssignedTo: unassignedbugs at nondot.org
        ReportedBy: tscheller at apple.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

In the attached test case we have the following loop:

  wideptr = (uint64_t*)ptr;
  while (len >= sizeof(wideval)) {
    *wideptr++ = wideval;
    len -= sizeof(wideval);
  }

(note that wideval is loop invariant)

When compiled with clang -O2 we get the following code:

LBB0_5:                                 ## %while.body18
                                        ## =>This Inner Loop Header: Depth=1
        movq    -8(%rbp), %rcx  <-- load of wideval we want to get rid of
        movq    %rcx, (%rax)
        addq    $8, %rax
        addq    $-8, %rdx
        cmpq    $7, %rd.
        ja      LBB0_5

wideval is initialized in the following loop:

  /* Get to 8-byte aligned start */
  tmp = (unsigned char*)&wideval;
  for (i = 0; i < sizeof(wideval); i++, tmp++) {
    *tmp = val;
  }

When this loop gets unrolled there's an opportunity for SROA to combine the
eight 8-bit stores into a single 64-bit store and the corresponding i64 value
for wideval can be used directly in the copy loop without going through memory.

When compiling the test case with clang -O2 and then feeding the generated LLVM
IR into opt -O2, SROA kicks in and as expected we get the following code:

LBB0_6:                                 ## %while.body18
                                        ## =>This Inner Loop Header: Depth=1
        movq    %rcx, (%rax)
        addq    $8, %rax
        addq    $-8, %rdx
        cmpq    $7, %rdx
        ja      LBB0_6

With a single invocation of clang -O2 the optimization does not kick in because
SROA is run before the loop unrolling happens.

Tested with trunk169456.

-- 
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.