[LLVMdev] Interprocedural Block Placement algorithm, challenges and opportunities

Rahman Lavaee rlavaee at cs.rochester.edu
Wed Mar 19 12:50:24 PDT 2014


Thanks John.

Regarding

First, if you can, try to use the mapping between BasicBlocks and
> MachineBasicBlocks after all LLVM IR optimizations have been done (if you
> are not doing that already).
>

Good point. I am not doing this at the moment, but I can and will certainly
do. It seems to me that even some of the IR optimizations are target
dependent. For instance, I believe the compiler performs tail call
optimizations only if it the target supports them. Therefore, I  probably
need to do the instrumentation some time during code generation.

Regarding


> Second, there are several ideas you might want to try:
>
> 1. The llvm.pcmarker() intrinsic seems close to what you need.  That said,
> it looks like the optimizers are free to move them around any way they
> like, but perhaps most optimizations will leave them within the basic block
> in which they were originally inserted.
>
> 2. A volatile load or an llvm.prefetch instruction might be a workable
> hack.  Alternatively, you could insert an inline assembly call which the
> optimizer is unlikely to move.  The key here is to provide a unique
> argument to the each instruction you insert so that you can map it back to
> its original basic block.
>
> 3. You could insert an llvm.var.annotation or llvm.annotation intrinsic
> into each basic block and modify the code generator to recognize your
> annotation.
>
> I'm not sure which of these would be the best option.  I would try
> llvm.pcmarker first to see if that works and then move on to the other
> options as needed.
>

Thank you so much for suggestions. I have not used intrinsics before, but
it looks like they can be handy. I will read some LLVM manual to learn more
about them and see what I can do.


>
> On 3/19/14, 12:34 PM, Rahman Lavaee wrote:
>
>  Hi,
>
> I have written a code layout feedback directed optimization pass, which
> currently works for basic block reordering and function reordering. It very
> effectively improves the speedup (we could improve Python by 30%). The
> profiling method is window based context sensitive which is based on
> reference affinity (
> https://urresearch.rochester.edu/fileDownloadForInstitutionalItem.action?itemId=28368&itemFileId=143426
> )
>
>  The pass works in the IR level. Therefore, it may lose some information
> during the machine code optimization passes and perform imprecisely for BB
> reordering.
>
>  Eventually, I would like to see the improve for an interprocedural basic
> block reordering pass. However, with the current system there are several
> challenges ahead. The most important is that the CFG is not preserved
> during several passes including code-gen-prepare, cfg-simplify,
> remove-unreachable-blocks, tail-merge, and tail-duplication. So in order to
> keep track of the mapping between MBBs and BBs, one needs to insert code in
> every function that modifies the structure of BBs and MBBs.
>
>  The current block placement pass (MachineBasicBlockPlacement) works at
> the machine code level and with the new profiling structure
> (SampleProfileLoader), is effective as far as context-free profiling info
> is considered sufficient. However, the implementation of
> SampleProfileLoader itself encourages context sensitive info, which cannot
> efficiently be provided with the current profiling structure
> (<func,lineNo>).
>
>  Is there any way to incorporate information into the emitted MBBs so
> that we can get IR basic block level info instead of lineNo info?
>
>  regards
>
>
> _______________________________________________
> LLVM Developers mailing listLLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140319/f7cbe609/attachment.html>


More information about the llvm-dev mailing list