[LLVMdev] [CodeGenPrepare] Sinking incoming values of a PHI Node

Tue May 20 18:05:17 PDT 2014

Hi,

I want to improve the way CGP sinks the incoming values of a PHI node
towards memory accesses. Improving it means a lot to some of our key
benchmarks, and I believe can benefit many general cases as well.

CGP's OptimizeMemoryInst function handles PHI nodes by running
AddressingModeMatcher on all incoming values to see whether they reach
consensus on the addressing mode. It does a reasonably good job in sinking
a PHI node if its incoming values are used only once (by the PHI node) and
share the same base register and offset. For example, CGP transforms

define void @foo(i1 %cond, float* %a) {
entry:
  br i1 %cond, label %if.then, label %if.else

if.then:
  %p1 = getelementptr inbounds float* %a, i64 4
  br label %merge

if.else:
  %q1 = getelementptr inbounds float* %a, i64 4
  br label %merge

merge:
  %r1 = phi float* [ %p1, %if.then ], [ %q1, %if.else ]
  %w1 = load float* %r1
  ret void
}

to

define void @foo(i1 %cond, float* %a) {
entry:
  br i1 %cond, label %if.then, label %if.else

if.then:                                          ; preds = %entry
  br label %merge

if.else:                                          ; preds = %entry
  br label %merge

merge:                                            ; preds = %if.else,
%if.then
  %0 = bitcast float* %a to i8*
  %sunkaddr = getelementptr i8* %0, i64 16
  %1 = bitcast i8* %sunkaddr to float*
  %w1 = load float* %1
  ret void
}

saving registers on %p1 and %q1.

However, this approach misses optimization opportunities in two cases.
First, if an incoming value of a PHI node is used in multiple memory
instructions, CGP may not be able to sink it.

For example, if we slightly complicate the above example by adding a load
after %p1,

define void @foo(i1 %cond, float* %a) {
entry:
  br i1 %cond, label %if.then, label %if.else

if.then:
  %p1 = getelementptr inbounds float* %a, i64 4
  %u1 = load float* %p1
  br label %merge

if.else:
  %q1 = getelementptr inbounds float* %a, i64 4
  br label %merge

merge:
  %r1 = phi float* [ %p1, %if.then ], [ %q1, %if.else ]
  %w1 = load float* %r1
  ret void
}

CGP is unable to sink %p1 any more, because, given %p1 is used multiple
times, CGP checks whether %p1 can be folded into every memory uses (in this
case, %u1 = load %p1 and %w1 = load %r1) with the same addressing mode. It
does that by dry-running MatchOperationAddr on both loads, however
MatchOperationAddr bails out on PHI nodes.

The second complication is the incoming values of a PHI node may have
different base registers. For example, we slightly modify the original
example by changing %q1's base to %b.

define void @foo(i1 %cond, float* %a, float* %b) {
entry:
  br i1 %cond, label %if.then, label %if.else

if.then:
  %p1 = getelementptr inbounds float* %a, i64 4
  br label %merge

if.else:
  %q1 = getelementptr inbounds float* %b, i64 4
  br label %merge

merge:
  %r1 = phi float* [ %p1, %if.then ], [ %q1, %if.else ]
  %w1 = load float* %r1
  ret void
}

Ideally we still want to fold %p1 and %q1 into the load by introducing a
new base phi'ing %a and %b, such as

%r = phi float* [ %a, %if.then ], [ %b, %if.else ]
%r1 = getelementptr inbounds float* %r, i64 4
%w1 = load float* %r1

saving registers on %p1 and %q1.

However, the consensus CGP requires all incoming values to reach includes
not only the offset but also the base register. (a + 4) and (b + 4)
unfortunately do not satisfy this condition. Therefore, CGP won't sink %p1
and %q1 toward BB %merge.

Our key benchmarks have both cases I described above. A simplified code
looks like:

define void @foo(i1 %cond, float* %a, float* %b) {
entry:
  br i1 %cond, label %if.then, label %if.else

if.then:
  %p1 = getelementptr inbounds float* %a, i64 4
  %u1 = load float* %p1
  br label %merge

if.else:
  %q1 = getelementptr inbounds float* %b, i64 4
  br label %merge

merge:
  %r1 = phi float* [ %p1, %if.then ], [ %q1, %if.else ]
  %w1 = load float* %r1
  ret void
}

so we need a solution to attack both issues. While we can solve the second
issue by introducing a new base register leveraging SSAUpdater, I am not
sure how to fold this complicated logic in MatchOperationAddr which updates
one single target addressing mode. One possibility is to extend ExtAddrMode
to support multiple base registers, which may complicate a lot of things.

While I am thinking about an appropriate solution, I would like to bring
these issues up to get more feedback. I really appreciate your thoughts on
this.

Thanks!
Jingyue
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140520/6197edaa/attachment.html>