[LLVMdev] [CodeGenPrepare] Sinking incoming values of a PHI Node

Wed May 21 16:22:04 PDT 2014

No problem.

I think Philip is correct that adding the load/store case should not be terribly difficult, though it would probably require moving the instruction in such a way that I don’t know technically an InstCombine in the purist sense, but personally I am more of a pragmatist on these matters. I could factor out the code and call it from visitLoadInst() and visitStoreInst() to see if it measurably impacts generated code, then worry about where to actually put it. 

I haven’t really thought through the implications of casted GEPs, but I suspect adding support for them should not be terribly difficult. Feel free to use anything in the patch. I’ll probably be resubmitting it in the next day or two.

Louis

> On May 21, 2014, at 3:25 PM, Jingyue Wu <jingyue at google.com> wrote:
> 
> Hi Louis, 
> 
> Thanks for posting this! After reading your patch and double-checking my benchmark, I think a similar logic as the one in your patch would definitely help. 
> 
> Some complications for my case: 
> 1. As Philip mentioned, we want to do this for PHI nodes used in load/store. 
> 2. The incoming values of the PHI node may be bitcasted/addrspacecasted GEPs and even uglygeps (ptrtoint, add, inttoptr). But I think these are easy to fix. 
> 3. We may have to run this transformation of combining GEPs in the NVPTX backend specifically after SeparateConstOffsetFromGEP (a pass I recently wrote). I am mostly dealing with the GEPs generated in this pass. 
> 
> Your patch inspires me the possibility of writing another pass that makes CGP's tasks easier. Based on the current implementation, I may end up with copying some logics from your patch. Is it acceptable at all? 
> 
> Jingyue
> 
> 
> On Wed, May 21, 2014 at 12:40 PM, Louis Gerbarg <lgg at apple.com> wrote:
> It appears to me that most of the optimization cases you are attempting to do here are handled by an InstCombine patch I wrote last week to handle address folding across GEPs (r209049 and r209065). Unfortunately there were some complications so we reverted it. I believe I have a fix for the issues that were causing failures on the buildbots, but I have not sent it yet because it also prevents the specific optimization case I had.
> 
> I am particularly interested in knowing what benchmarks this helps you with. I have some more thoughts on how to deal with some related optimization opportunities, I’ll try to write them up for comments today or tomorrow.
> 
> Attached is a combined patch that includes the original commit, the first set of fixes from friday night, and a final fix that allowed me to pass the reduction cases sent to me.
> 
> Louis
> 
> 
> 
> 
> > On May 20, 2014, at 6:05 PM, Jingyue Wu <jingyue at google.com> wrote:
> >
> > Hi,
> >
> > I want to improve the way CGP sinks the incoming values of a PHI node towards memory accesses. Improving it means a lot to some of our key benchmarks, and I believe can benefit many general cases as well.
> >
> > CGP's OptimizeMemoryInst function handles PHI nodes by running AddressingModeMatcher on all incoming values to see whether they reach consensus on the addressing mode. It does a reasonably good job in sinking a PHI node if its incoming values are used only once (by the PHI node) and share the same base register and offset. For example, CGP transforms
> >
> > define void @foo(i1 %cond, float* %a) {
> > entry:
> >   br i1 %cond, label %if.then, label %if.else
> >
> > if.then:
> >   %p1 = getelementptr inbounds float* %a, i64 4
> >   br label %merge
> >
> > if.else:
> >   %q1 = getelementptr inbounds float* %a, i64 4
> >   br label %merge
> >
> > merge:
> >   %r1 = phi float* [ %p1, %if.then ], [ %q1, %if.else ]
> >   %w1 = load float* %r1
> >   ret void
> > }
> >
> > to
> >
> > define void @foo(i1 %cond, float* %a) {
> > entry:
> >   br i1 %cond, label %if.then, label %if.else
> >
> > if.then:                                          ; preds = %entry
> >   br label %merge
> >
> > if.else:                                          ; preds = %entry
> >   br label %merge
> >
> > merge:                                            ; preds = %if.else, %if.then
> >   %0 = bitcast float* %a to i8*
> >   %sunkaddr = getelementptr i8* %0, i64 16
> >   %1 = bitcast i8* %sunkaddr to float*
> >   %w1 = load float* %1
> >   ret void
> > }
> >
> > saving registers on %p1 and %q1.
> >
> > However, this approach misses optimization opportunities in two cases. First, if an incoming value of a PHI node is used in multiple memory instructions, CGP may not be able to sink it.
> >
> > For example, if we slightly complicate the above example by adding a load after %p1,
> >
> > define void @foo(i1 %cond, float* %a) {
> > entry:
> >   br i1 %cond, label %if.then, label %if.else
> >
> > if.then:
> >   %p1 = getelementptr inbounds float* %a, i64 4
> >   %u1 = load float* %p1
> >   br label %merge
> >
> > if.else:
> >   %q1 = getelementptr inbounds float* %a, i64 4
> >   br label %merge
> >
> > merge:
> >   %r1 = phi float* [ %p1, %if.then ], [ %q1, %if.else ]
> >   %w1 = load float* %r1
> >   ret void
> > }
> >
> > CGP is unable to sink %p1 any more, because, given %p1 is used multiple times, CGP checks whether %p1 can be folded into every memory uses (in this case, %u1 = load %p1 and %w1 = load %r1) with the same addressing mode. It does that by dry-running MatchOperationAddr on both loads, however MatchOperationAddr bails out on PHI nodes.
> >
> > The second complication is the incoming values of a PHI node may have different base registers. For example, we slightly modify the original example by changing %q1's base to %b.
> >
> > define void @foo(i1 %cond, float* %a, float* %b) {
> > entry:
> >   br i1 %cond, label %if.then, label %if.else
> >
> > if.then:
> >   %p1 = getelementptr inbounds float* %a, i64 4
> >   br label %merge
> >
> > if.else:
> >   %q1 = getelementptr inbounds float* %b, i64 4
> >   br label %merge
> >
> > merge:
> >   %r1 = phi float* [ %p1, %if.then ], [ %q1, %if.else ]
> >   %w1 = load float* %r1
> >   ret void
> > }
> >
> > Ideally we still want to fold %p1 and %q1 into the load by introducing a new base phi'ing %a and %b, such as
> >
> > %r = phi float* [ %a, %if.then ], [ %b, %if.else ]
> > %r1 = getelementptr inbounds float* %r, i64 4
> > %w1 = load float* %r1
> >
> > saving registers on %p1 and %q1.
> >
> > However, the consensus CGP requires all incoming values to reach includes not only the offset but also the base register. (a + 4) and (b + 4) unfortunately do not satisfy this condition. Therefore, CGP won't sink %p1 and %q1 toward BB %merge.
> >
> > Our key benchmarks have both cases I described above. A simplified code looks like:
> >
> > define void @foo(i1 %cond, float* %a, float* %b) {
> > entry:
> >   br i1 %cond, label %if.then, label %if.else
> >
> > if.then:
> >   %p1 = getelementptr inbounds float* %a, i64 4
> >   %u1 = load float* %p1
> >   br label %merge
> >
> > if.else:
> >   %q1 = getelementptr inbounds float* %b, i64 4
> >   br label %merge
> >
> > merge:
> >   %r1 = phi float* [ %p1, %if.then ], [ %q1, %if.else ]
> >   %w1 = load float* %r1
> >   ret void
> > }
> >
> > so we need a solution to attack both issues. While we can solve the second issue by introducing a new base register leveraging SSAUpdater, I am not sure how to fold this complicated logic in MatchOperationAddr which updates one single target addressing mode. One possibility is to extend ExtAddrMode to support multiple base registers, which may complicate a lot of things.
> >
> > While I am thinking about an appropriate solution, I would like to bring these issues up to get more feedback. I really appreciate your thoughts on this.
> >
> > Thanks!
> > Jingyue
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140521/02f49650/attachment.html>