<div dir="ltr"><div>Thanks John.<br></div><div class="gmail_extra"><br></div><div class="gmail_extra">Regarding<br></div><div class="gmail_extra"><br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div bgcolor="#FFFFFF" text="#000000"><div>

      First, if you can, try to use the mapping between BasicBlocks and

      MachineBasicBlocks after all LLVM IR optimizations have been done

      (if you are not doing that already).<br></div></div></blockquote><div><br></div><div>Good point. I am not doing this at the moment, but I can and will certainly do. It seems to me that even some of the IR optimizations are target dependent. For instance, I believe the compiler performs tail call optimizations only if it the target supports them. Therefore, I  probably need to do the instrumentation some time during code generation.<br>

</div><div><br></div><div>Regarding<br><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><div>

      <br>

      Second, there are several ideas you might want to try:<br>

      <br>

      1. The llvm.pcmarker() intrinsic seems close to what you need. 

      That said, it looks like the optimizers are free to move them

      around any way they like, but perhaps most optimizations will

      leave them within the basic block in which they were originally

      inserted.<br>

      <br>

      2. A volatile load or an llvm.prefetch instruction might be a

      workable hack.  Alternatively, you could insert an inline assembly

      call which the optimizer is unlikely to move.  The key here is to

      provide a unique argument to the each instruction you insert so

      that you can map it back to its original basic block.<br>

      <br>

      3. You could insert an llvm.var.annotation or llvm.annotation

      intrinsic into each basic block and modify the code generator to

      recognize your annotation.<br>

      <br>

      I'm not sure which of these would be the best option.  I would try

      llvm.pcmarker first to see if that works and then move on to the

      other options as needed.<br></div></div></blockquote><div><br></div><div>Thank you so much for suggestions. I have not used intrinsics before, but it looks like they can be handy. I will read some LLVM manual to learn more about them and see what I can do.<br>

<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><div><div><div class="h5">

      <br>

      <br>

      On 3/19/14, 12:34 PM, Rahman Lavaee wrote:<br>

    </div></div></div>

    <blockquote type="cite"><div><div class="h5">

      <div dir="ltr">

        <div>Hi,<br>

          <br>

          I have written a code layout feedback directed optimization

          pass, which currently works for basic block reordering and

          function reordering. It very effectively improves the speedup

          (we could improve Python by 30%). The profiling method is

          window based context sensitive which is based on reference

          affinity (<a href="https://urresearch.rochester.edu/fileDownloadForInstitutionalItem.action?itemId=28368&itemFileId=143426" target="_blank">https://urresearch.rochester.edu/fileDownloadForInstitutionalItem.action?itemId=28368&itemFileId=143426</a>)<br>

          <br>

        </div>

        <div>The pass works in the IR level. Therefore, it may lose some

          information during the machine code optimization passes and

          perform imprecisely for BB reordering.<br>

          <br>

        </div>

        <div>Eventually, I would like to see the improve for an

          interprocedural basic block reordering pass. However, with the

          current system there are several challenges ahead. The most

          important is that the CFG is not preserved during several

          passes including code-gen-prepare, cfg-simplify,

          remove-unreachable-blocks, tail-merge, and tail-duplication.

          So in order to keep track of the mapping between MBBs and BBs,

          one needs to insert code in every function that modifies the

          structure of BBs and MBBs.<br>

          <br>

        </div>

        <div>The current block placement pass

          (MachineBasicBlockPlacement) works at the machine code level

          and with the new profiling structure (SampleProfileLoader), is

          effective as far as context-free profiling info is considered

          sufficient. However, the implementation of SampleProfileLoader

          itself encourages context sensitive info, which cannot

          efficiently be provided with the current profiling structure

          (<func,lineNo>).<br>

          <br>

        </div>

        <div>Is there any way to incorporate information into the

          emitted MBBs so that we can get IR basic block level info

          instead of lineNo info?<br>

          <br>

        </div>

        <div>regards<br>

        </div>

      </div>

      <br>

      <fieldset></fieldset>

      <br>

      </div></div><pre>_______________________________________________

LLVM Developers mailing list

<a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a>

</pre>

    </blockquote>

    <br>

  </div>

</blockquote></div><br></div></div>