[llvm-dev] Basic block merging

Wed May 29 13:03:24 PDT 2019

SimplifyCFG has some code for doing some merging. For example
SinkCommonCodeFromPredecessors

~Craig

On Wed, May 29, 2019 at 11:58 AM Michael Kruse via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Am Mi., 29. Mai 2019 um 13:31 Uhr schrieb Shawn Landden via llvm-dev
> <llvm-dev at lists.llvm.org>:
> >
> > On Wed, May 29, 2019 at 10:49 AM David Jones via llvm-dev
> > <llvm-dev at lists.llvm.org> wrote:
> > >
> > > Under certain circumstances, my compiler outputs basic blocks having
> the same function:
> > >
> > > bb_97:                                            ; preds = %bb_1
> > >   %476 = getelementptr inbounds %LMtop.I0.ARType, %LMtop.I0.ARType*
> %0, i64 0, i32 6
> > >   %477 = bitcast i8** %476 to %LBstd.Cprocess.CRType**
> > >   %478 = load %LBstd.Cprocess.CRType*, %LBstd.Cprocess.CRType** %477,
> align 8
> > >   %479 = getelementptr inbounds %LBstd.Cprocess.CRType,
> %LBstd.Cprocess.CRType* %478, i64 0, i32 3
> > >   %480 = bitcast i8** %479 to %LMtop.I0.ARType**
> > >   store %LMtop.I0.ARType* %0, %LMtop.I0.ARType** %480, align 8
> > >   br label %bb_106
> > >
> > > bb_98:                                            ; preds = %bb_1
> > >   %481 = getelementptr inbounds %LMtop.I0.ARType, %LMtop.I0.ARType*
> %0, i64 0, i32 6
> > >   %482 = bitcast i8** %481 to %LBstd.Cprocess.CRType**
> > >   %483 = load %LBstd.Cprocess.CRType*, %LBstd.Cprocess.CRType** %482,
> align 8
> > >   %484 = getelementptr inbounds %LBstd.Cprocess.CRType,
> %LBstd.Cprocess.CRType* %483, i64 0, i32 3
> > >   %485 = bitcast i8** %484 to %LMtop.I0.ARType**
> > >   store %LMtop.I0.ARType* %0, %LMtop.I0.ARType** %485, align 8
> > >   br label %bb_106
> > >
> > > These blocks have provably the same function - the instructions of one
> correspond perfectly to instructions in the other.
> > >
> > > I would have expected some optimization to detect this, and merge the
> blocks by forwarding all jumps to one, to jump to the other.
> > >
> > > I notice a "merge functions" pass, which does this if two functions
> can be proven to have identical effect.
> > >
> > > Is there a pass that merges blocks? Is it enabled by default?  If so,
> then is there something non-obvious in the code that would be defeating
> this optimization?
> > I see two problems here (neither unsolvable): 1) I believe the
> > register allocator is working over a full function, and for this merge
> > the registers and stack/frame offsets have to be the same. 2) if the
> > block very small the icache might not work as well, and on some arches
> > long jumps are more expensive than short jumps and require trunks.
>
> I understand this request differently. This is on LLVM-IR, so register
> allocation would happen on the merged block. I-Cache utilization
> should improve since there is less code in the merged blocked compared
> to the two original blocks.
>
> I could imagine an algorithm that checks whether the two blocks are
> isomorphic and have the same jump targets. Then merge the two blocks
> (including adding PHIs if the blocks have different predecessors) by
> RAUW one block with this other.
>
> In the example, both blocks have the same predecessor, i.e. it will be
> a conditional branch. In this case the transformation (after having
> determined that the blocks are isomorphic) would change it to an
> unconditional branch to one of the blocks. The other one becomes dead
> code.
>
> The question is, whether the improvement justifies the additional cost
> of identifying isomorphic blocks.
>
> Michael
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190529/021fae9f/attachment.html>