[LLVMbugs] [Bug 8942] New: Not aggressively optimizing std::fill loop

Sun Jan 9 15:38:45 PST 2011

http://llvm.org/bugs/show_bug.cgi?id=8942

           Summary: Not aggressively optimizing std::fill loop
           Product: libraries
           Version: 1.0
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: Scalar Optimizations
        AssignedTo: unassignedbugs at nondot.org
        ReportedBy: clattner at apple.com
                CC: llvmbugs at cs.uiuc.edu

Chandler observed that we don't optimize this function into a memset:

void f1(int* begin, int* end) {
  std::fill(begin, end, 0);
}

With recent changes, we now compute a backedge taken count for this loop, of
"((-4 + (-1 * %__first) + %__last) /u 4)".  The problem is that indvars has
some (very reasonable in general) code that Dan added at the top of
IndVarSimplify::LinearFunctionTestReplace:

  // Special case: If the backedge-taken count is a UDiv, it's very likely a
  // UDiv that ScalarEvolution produced in order to compute a precise
  // expression, rather than a UDiv from the user's code. If we can't find a
  // UDiv in the code with some simple searching, assume the former and forego
  // rewriting the loop.
  if (isa<SCEVUDivExpr>(BackedgeTakenCount)) {

If I hack out that code, we form a memset and delete the loop.  There are two
problems here.  First, if I disable the code, I get some pretty gross IR:

define void @_Z1fPiS_(i32* %begin, i32* %end) nounwind {
entry:
  %cmp7.i.i = icmp eq i32* %begin, %end
  br i1 %cmp7.i.i, label %_ZSt4fillIPiiEvT_S1_RKT0_.exit, label
%for.body.lr.ph.i.i

for.body.lr.ph.i.i:                               ; preds = %entry
  %begin2 = bitcast i32* %begin to i8*
  %__first10.i.i = ptrtoint i32* %begin to i64
  %scevgep.i.i = getelementptr i32* %end, i64 -1
  %scevgep9.i.i = bitcast i32* %scevgep.i.i to i8*
  %tmp.i.i = sub i64 0, %__first10.i.i
  %uglygep.i.i = getelementptr i8* %scevgep9.i.i, i64 %tmp.i.i
  %uglygep11.i.i = ptrtoint i8* %uglygep.i.i to i64
  %tmp12.i.i4 = add i64 %uglygep11.i.i, 4
  %tmp3 = and i64 %tmp12.i.i4, -4
  call void @llvm.memset.p0i8.i64(i8* %begin2, i8 0, i64 %tmp3, i32 4, i1
false)
  ret void

_ZSt4fillIPiiEvT_S1_RKT0_.exit:                   ; preds = %entry
  ret void
}

1. The "gep -1", and "add -4" should be merged together.  It is unclear if it
should be SCEV doing this or instcombine.

2. The "and x, -4" is because we don't have an "isexact" bit on the udiv
generated by the trip count.  We have no way to represent this in SCEV or IR
(PR8862).

The second issue is that just hacking out the code isn't the right thing to do.
 We should use the "isexact" bit on the udiv scev to decide if the end result
will be simple enough to make it profitable.

-- 
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.