[LLVMdev] [CodeGenPrepare] Sinking incoming values of a PHI Node

Jingyue Wu jingyue at google.com
Wed May 21 15:25:59 PDT 2014


Hi Louis,

Thanks for posting this! After reading your patch and double-checking my
benchmark, I think a similar logic as the one in your patch would
definitely help.

Some complications for my case:
1. As Philip mentioned, we want to do this for PHI nodes used in
load/store.
2. The incoming values of the PHI node may be bitcasted/addrspacecasted
GEPs and even uglygeps (ptrtoint, add, inttoptr). But I think these are
easy to fix.
3. We may have to run this transformation of combining GEPs in the NVPTX
backend specifically after SeparateConstOffsetFromGEP (a pass I recently
wrote). I am mostly dealing with the GEPs generated in this pass.

Your patch inspires me the possibility of writing another pass that makes
CGP's tasks easier. Based on the current implementation, I may end up with
copying some logics from your patch. Is it acceptable at all?

Jingyue


On Wed, May 21, 2014 at 12:40 PM, Louis Gerbarg <lgg at apple.com> wrote:

> It appears to me that most of the optimization cases you are attempting to
> do here are handled by an InstCombine patch I wrote last week to handle
> address folding across GEPs (r209049 and r209065). Unfortunately there were
> some complications so we reverted it. I believe I have a fix for the issues
> that were causing failures on the buildbots, but I have not sent it yet
> because it also prevents the specific optimization case I had.
>
> I am particularly interested in knowing what benchmarks this helps you
> with. I have some more thoughts on how to deal with some related
> optimization opportunities, I’ll try to write them up for comments today or
> tomorrow.
>
> Attached is a combined patch that includes the original commit, the first
> set of fixes from friday night, and a final fix that allowed me to pass the
> reduction cases sent to me.
>
> Louis
>
>
>
>
> > On May 20, 2014, at 6:05 PM, Jingyue Wu <jingyue at google.com> wrote:
> >
> > Hi,
> >
> > I want to improve the way CGP sinks the incoming values of a PHI node
> towards memory accesses. Improving it means a lot to some of our key
> benchmarks, and I believe can benefit many general cases as well.
> >
> > CGP's OptimizeMemoryInst function handles PHI nodes by running
> AddressingModeMatcher on all incoming values to see whether they reach
> consensus on the addressing mode. It does a reasonably good job in sinking
> a PHI node if its incoming values are used only once (by the PHI node) and
> share the same base register and offset. For example, CGP transforms
> >
> > define void @foo(i1 %cond, float* %a) {
> > entry:
> >   br i1 %cond, label %if.then, label %if.else
> >
> > if.then:
> >   %p1 = getelementptr inbounds float* %a, i64 4
> >   br label %merge
> >
> > if.else:
> >   %q1 = getelementptr inbounds float* %a, i64 4
> >   br label %merge
> >
> > merge:
> >   %r1 = phi float* [ %p1, %if.then ], [ %q1, %if.else ]
> >   %w1 = load float* %r1
> >   ret void
> > }
> >
> > to
> >
> > define void @foo(i1 %cond, float* %a) {
> > entry:
> >   br i1 %cond, label %if.then, label %if.else
> >
> > if.then:                                          ; preds = %entry
> >   br label %merge
> >
> > if.else:                                          ; preds = %entry
> >   br label %merge
> >
> > merge:                                            ; preds = %if.else,
> %if.then
> >   %0 = bitcast float* %a to i8*
> >   %sunkaddr = getelementptr i8* %0, i64 16
> >   %1 = bitcast i8* %sunkaddr to float*
> >   %w1 = load float* %1
> >   ret void
> > }
> >
> > saving registers on %p1 and %q1.
> >
> > However, this approach misses optimization opportunities in two cases.
> First, if an incoming value of a PHI node is used in multiple memory
> instructions, CGP may not be able to sink it.
> >
> > For example, if we slightly complicate the above example by adding a
> load after %p1,
> >
> > define void @foo(i1 %cond, float* %a) {
> > entry:
> >   br i1 %cond, label %if.then, label %if.else
> >
> > if.then:
> >   %p1 = getelementptr inbounds float* %a, i64 4
> >   %u1 = load float* %p1
> >   br label %merge
> >
> > if.else:
> >   %q1 = getelementptr inbounds float* %a, i64 4
> >   br label %merge
> >
> > merge:
> >   %r1 = phi float* [ %p1, %if.then ], [ %q1, %if.else ]
> >   %w1 = load float* %r1
> >   ret void
> > }
> >
> > CGP is unable to sink %p1 any more, because, given %p1 is used multiple
> times, CGP checks whether %p1 can be folded into every memory uses (in this
> case, %u1 = load %p1 and %w1 = load %r1) with the same addressing mode. It
> does that by dry-running MatchOperationAddr on both loads, however
> MatchOperationAddr bails out on PHI nodes.
> >
> > The second complication is the incoming values of a PHI node may have
> different base registers. For example, we slightly modify the original
> example by changing %q1's base to %b.
> >
> > define void @foo(i1 %cond, float* %a, float* %b) {
> > entry:
> >   br i1 %cond, label %if.then, label %if.else
> >
> > if.then:
> >   %p1 = getelementptr inbounds float* %a, i64 4
> >   br label %merge
> >
> > if.else:
> >   %q1 = getelementptr inbounds float* %b, i64 4
> >   br label %merge
> >
> > merge:
> >   %r1 = phi float* [ %p1, %if.then ], [ %q1, %if.else ]
> >   %w1 = load float* %r1
> >   ret void
> > }
> >
> > Ideally we still want to fold %p1 and %q1 into the load by introducing a
> new base phi'ing %a and %b, such as
> >
> > %r = phi float* [ %a, %if.then ], [ %b, %if.else ]
> > %r1 = getelementptr inbounds float* %r, i64 4
> > %w1 = load float* %r1
> >
> > saving registers on %p1 and %q1.
> >
> > However, the consensus CGP requires all incoming values to reach
> includes not only the offset but also the base register. (a + 4) and (b +
> 4) unfortunately do not satisfy this condition. Therefore, CGP won't sink
> %p1 and %q1 toward BB %merge.
> >
> > Our key benchmarks have both cases I described above. A simplified code
> looks like:
> >
> > define void @foo(i1 %cond, float* %a, float* %b) {
> > entry:
> >   br i1 %cond, label %if.then, label %if.else
> >
> > if.then:
> >   %p1 = getelementptr inbounds float* %a, i64 4
> >   %u1 = load float* %p1
> >   br label %merge
> >
> > if.else:
> >   %q1 = getelementptr inbounds float* %b, i64 4
> >   br label %merge
> >
> > merge:
> >   %r1 = phi float* [ %p1, %if.then ], [ %q1, %if.else ]
> >   %w1 = load float* %r1
> >   ret void
> > }
> >
> > so we need a solution to attack both issues. While we can solve the
> second issue by introducing a new base register leveraging SSAUpdater, I am
> not sure how to fold this complicated logic in MatchOperationAddr which
> updates one single target addressing mode. One possibility is to extend
> ExtAddrMode to support multiple base registers, which may complicate a lot
> of things.
> >
> > While I am thinking about an appropriate solution, I would like to bring
> these issues up to get more feedback. I really appreciate your thoughts on
> this.
> >
> > Thanks!
> > Jingyue
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140521/3d637786/attachment.html>


More information about the llvm-dev mailing list