[LLVMbugs] [Bug 20425] New: SROA fails to promote seemingly simple alloca's to registers

bugzilla-daemon at llvm.org bugzilla-daemon at llvm.org
Wed Jul 23 17:23:42 PDT 2014


http://llvm.org/bugs/show_bug.cgi?id=20425

            Bug ID: 20425
           Summary: SROA fails to promote seemingly simple alloca's to
                    registers
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: Scalar Optimizations
          Assignee: unassignedbugs at nondot.org
          Reporter: wujingyue at gmail.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

Created attachment 12818
  --> http://llvm.org/bugs/attachment.cgi?id=12818&action=edit
failed test

The test case (also attached as sroa-phi.ll) is as follows:

define float @foo(i1 %cond, float* %temp) {
entry:
  %arr = alloca [4 x float], align 4
  br i1 %cond, label %then, label %else
then:
  %0 = getelementptr inbounds [4 x float]* %arr, i64 0, i64 3
  store float 1.000000e+00, float* %0, align 4
  br label %merge
else:
  %1 = getelementptr inbounds [4 x float]* %arr, i64 0, i64 3
  store float 2.000000e+00, float* %1, align 4
  br label %merge
merge:
  %2 = phi float* [ %0, %then ], [ %1, %else ]
  store float 0.000000e+00, float* %temp, align 4
  %3 = load float* %2, align 4
  ret float %3
}

SROA transforms this code to:

define float @foo(i1 %cond, float* %temp) {
entry:
  %arr.sroa.0 = alloca float
  br i1 %cond, label %then, label %else
then:                                             ; preds = %entry
  store float 1.000000e+00, float* %arr.sroa.0, align 4
  br label %merge
else:                                             ; preds = %entry
  store float 1.000000e+00, float* %arr.sroa.0, align 4
  br label %merge
merge:                                            ; preds = %else, %then
  %0 = phi float* [ %arr.sroa.0, %then ], [ %arr.sroa.0, %else ]
  store float 0.000000e+00, float* %temp
  %1 = load float* %0, align 4
  ret float %1
}

leaving the alloca unpromoted to a register. 

Root cause: 

SROA aims to promote stack variables (scalar and aggregates/arrays) to
registers. It works in two steps:

1. Split an array alloca into scalar allocas or "slices". In the above example,
all accesses to %arr have a known offset. Therefore, SROA splits %arr into four
separate allocas representing %arr[0], %arr[1], %arr[2], and %arr[3], and
replacing %0 and %1 with the slice that represents %arr[3]. 

2. Promote slices into registers. Before promoting a slice to a register, we
need to check all the uses of the slice can be transformed to use the virtual
register. For example, we cannot promote a slice that is used as a function
call argument. Another tricky case is where the slice is used by a PHI node. 

In our example, both incoming values of PHI node %0 are exactly the slice, so
replacing "load %0" with the register that represents the slice should be
totally safe. However, in general, a PHI node that uses the slice can also use
other incoming values. For example,

%P2 = phi [i32* %Slice, i32* %Other]
...
%V = load i32* %P2

we cannot replace %V entirely with the register representing the slice, because
%P2 takes %Other which may or may not equal the slice.  

Fortunately, SROA may still do partial promotion in this case: only promote
%slice and leave %other as a pointer. That would give us:

%V1 = load i32* %Slice      -> will be mem2reg'd
br ...
%V2 = load i32* %Other
br ...
%V = phi [i32 %V1, i32 %V2]

However, one caveat is, since we are moving the load up to the predecessors, we
need to make sure no instructions between %P2 and %V write to %P2. This is why
SROA fails on our test. The store
  store float 0.000000e+00, float* %temp                                        
between
  %2 = phi float* [ %0, %then ], [ %1, %else ]                                  
and
  %3 = load float* %2, align 4                                                  
prevents SROA from even partially promoting %3. SROA could run alias analysis
to tell %2 and %temp do not alias, but it's not done probably because alias
analysis is too expensive. 

Proposed solution:

To make SROA work for the particular test and hopefully other similar
situations, we can replace all PHI nodes that are trivially equivalent to the
slice with the slice itself. After that, SROA needn't worry about these PHI
users any more. This approach is similar to but much cheaper than running a
pass of instcombine in the middle of SROA.

I'll send out a patch soon.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20140724/0e5a98f4/attachment.html>


More information about the llvm-bugs mailing list