[PATCH] D15768: WIP: Introduce a combine that tries to simplify or hoist a single user of a phi node through the incoming values.

Thu Dec 24 09:42:27 PST 2015

On 12/24/2015 01:13 AM, Chandler Carruth via llvm-commits wrote:
> chandlerc created this revision.
> chandlerc added reviewers: craig.topper, majnemer.
> chandlerc added a subscriber: llvm-commits.
>
> The original movitavion was the following code pattern coming out of
> Clang itself:
>
>    define i64 @_ZNK5clang4Sema12getExprRangeEPNS_4ExprE(%"class.clang::Sema"* nocapture readnone %this, %"class.clang::Expr"* %E) {
>      %1 = icmp eq %"class.clang::Expr"* %E, null
>      br i1 %1, label %6, label %2
>
>    ; <label>:2                                       ; preds = %0
>      %3 = bitcast %"class.clang::Expr"* %E to %"class.clang::Stmt"*
>      %4 = tail call i64 @_ZNK5clang4Stmt14getSourceRangeEv(%"class.clang::Stmt"* %3)
>      %5 = and i64 %4, -4294967296
>      %phitmp = and i64 %4, 4294967295
>      br label %6
>
>    ; <label>:6                                       ; preds = %0, %2
>      %.sroa.3.0 = phi i64 [ %5, %2 ], [ 0, %0 ]
>      %.sroa.0.0 = phi i64 [ %phitmp, %2 ], [ 0, %0 ]
>      %7 = or i64 %.sroa.0.0, %.sroa.3.0
>      ret i64 %7
>    }
>
> This really should simplify away, but it doesn't.
Hm, this is an interesting example.  It's worth noting that the OR can 
constant fold on the %0,%6 edge.  This really looks to me like a partial 
redundancy elimination case.  The value of %7 is known along one edge 
(via constant folding), so we can push it back into the other incoming 
edge.  The current performScalarPRE code in GVN.cpp doesn't handle this, 
but that's the framing that immediately jumps to mind.

Another possibility is to frame this as tail duplication.  In fact, why 
doesn't our existing tail duplication logic get this specific case?  It 
definitely should.

Another approach is not to hoist the or, but to sink the two ands. In 
this particular case, sinking the and past the phi doesn't change the 
result at all (since we know the other input is a constant zero.)  This 
would produce inferior code if the %0,%6 case is the dominate one, but 
maybe this would work either as a PGO or a canonical form we later 
reverse?  It reminds me a lot of PHI to Select canonicalization actually.

I also noticed a couple of spiritually similar optimizations in 
InstCombine around Selects.  For instance:
  // add (select X 0 (sub n A)) A  -->  select X A n
and foldOpIntoSelect in general..
> My first attempt to
> help teach LLVM about these kinds of patterns is this patch. It doesn't
> handle the above case (where two separate PHI nodes would both need to
> be simplified at the same time), but it handles many other cases.
Haven't looked at the actual patch yet.  Won't get a chance to do that 
until next week at the earliest.
>
> When using a Clang with this patch to build the LLVM test suite, we
> fully simplify approximately 2000 phi nodes, and we hoist a user into
> one predecessor (simplifying the other predecessors) 57 times.
>
> I'm still not sure this is the right approach. It is fairly heavy weight
> (as you'll see) and I don't see a good way to extend its current
> strategy to handle hoisting around multiple phi nodes. =/
>
> I'm thinking of trying a different approach which would be a generic
> combine firing on users of phi nodes, and try to hoist that user into
> a predecessor to expose more simplifications. This would for example be
> able to handle cases where both operands were phi nodes. But this
> wouldn't completely obviate the patch I have here -- there are cases
> that this will fire on that the other might not.
>
> The other overall concern I have about this is that it really needs to
> be "late" in instcombine. We only want to hoist around phi nodes after
> every user of the instruction has already been combined and the current
> instruction's other combines have already been exhausted. Unfortunately
> this is the exact opposite order of combining from what instcombine
> usually does. I'm not really sure how to address this.
I'm not sure I really follow your ordering concern.  At least in 
principal, any of the transforms that would have triggered should be 
able to hoist themselves into the predecessor and then trigger.  (At 
least when the other input is constant.)  Do you have particular 
examples in mind here?  I may be missing the obvious.
>
> Thoughts and suggestions on the issue at large very welcome!
>
> http://reviews.llvm.org/D15768
>
> Files:
>    lib/Transforms/InstCombine/InstCombinePHI.cpp
>    test/Transforms/InstCombine/select.ll
>    test/Transforms/InstCombine/shift-sra.ll
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151224/e94c93f6/attachment.html>