[llvm-dev] Basic block merging

Wed May 29 13:02:19 PDT 2019

On Wed, May 29, 2019 at 1:58 PM Michael Kruse <llvmdev at meinersbur.de> wrote:
>
> Am Mi., 29. Mai 2019 um 13:31 Uhr schrieb Shawn Landden via llvm-dev
> <llvm-dev at lists.llvm.org>:
> >
> > On Wed, May 29, 2019 at 10:49 AM David Jones via llvm-dev
> > <llvm-dev at lists.llvm.org> wrote:
> > >
> > > Under certain circumstances, my compiler outputs basic blocks having the same function:
> > >
> > > bb_97:                                            ; preds = %bb_1
> > >   %476 = getelementptr inbounds %LMtop.I0.ARType, %LMtop.I0.ARType* %0, i64 0, i32 6
> > >   %477 = bitcast i8** %476 to %LBstd.Cprocess.CRType**
> > >   %478 = load %LBstd.Cprocess.CRType*, %LBstd.Cprocess.CRType** %477, align 8
> > >   %479 = getelementptr inbounds %LBstd.Cprocess.CRType, %LBstd.Cprocess.CRType* %478, i64 0, i32 3
> > >   %480 = bitcast i8** %479 to %LMtop.I0.ARType**
> > >   store %LMtop.I0.ARType* %0, %LMtop.I0.ARType** %480, align 8
> > >   br label %bb_106
> > >
> > > bb_98:                                            ; preds = %bb_1
> > >   %481 = getelementptr inbounds %LMtop.I0.ARType, %LMtop.I0.ARType* %0, i64 0, i32 6
> > >   %482 = bitcast i8** %481 to %LBstd.Cprocess.CRType**
> > >   %483 = load %LBstd.Cprocess.CRType*, %LBstd.Cprocess.CRType** %482, align 8
> > >   %484 = getelementptr inbounds %LBstd.Cprocess.CRType, %LBstd.Cprocess.CRType* %483, i64 0, i32 3
> > >   %485 = bitcast i8** %484 to %LMtop.I0.ARType**
> > >   store %LMtop.I0.ARType* %0, %LMtop.I0.ARType** %485, align 8
> > >   br label %bb_106
> > >
> > > These blocks have provably the same function - the instructions of one correspond perfectly to instructions in the other.
> > >
> > > I would have expected some optimization to detect this, and merge the blocks by forwarding all jumps to one, to jump to the other.
> > >
> > > I notice a "merge functions" pass, which does this if two functions can be proven to have identical effect.
> > >
> > > Is there a pass that merges blocks? Is it enabled by default?  If so, then is there something non-obvious in the code that would be defeating this optimization?
> > I see two problems here (neither unsolvable): 1) I believe the
> > register allocator is working over a full function, and for this merge
> > the registers and stack/frame offsets have to be the same. 2) if the
> > block very small the icache might not work as well, and on some arches
> > long jumps are more expensive than short jumps and require trunks.
>
> I understand this request differently. This is on LLVM-IR, so register
> allocation would happen on the merged block. I-Cache utilization
> should improve since there is less code in the merged blocked compared
> to the two original blocks.
>
> I could imagine an algorithm that checks whether the two blocks are
> isomorphic and have the same jump targets. Then merge the two blocks
> (including adding PHIs if the blocks have different predecessors) by
> RAUW one block with this other.
>
> In the example, both blocks have the same predecessor, i.e. it will be
> a conditional branch. In this case the transformation (after having
> determined that the blocks are isomorphic) would change it to an
> unconditional branch to one of the blocks. The other one becomes dead
> code.
Oh I was thinking very differently: similar blocks in different
functions (which is reasonable, but more complicated).
That LLVM can't merge equivalent blocks in the same function is
certainly is a bug.
>
> The question is, whether the improvement justifies the additional cost
> of identifying isomorphic blocks.
Could the merge-functions pass be somehow modified to also do this? in
a way that would require code duplication, with the common code
refactored out.

-Shawn Landden