[PATCH] D104060: Machine IR Profile

Ellis Hoag via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jun 16 18:04:52 PDT 2021


ellis added a comment.

In D104060#2818268 <https://reviews.llvm.org/D104060#2818268>, @MaskRay wrote:

> `__llvm_mip_call_counts_caller` is slow.
> It is a function with a custom call convention using RAX as the argument on x86-64.
> The impl detail function saves and restores many vector registers.
> I haven't studied why `__llvm_mip_call_counts_caller` is needed.

Yes, `__llvm_mip_call_counts_caller` is not optimal, but we wanted to first have correctness. Since we are injecting calls to the runtime at the very beginning of functions, we save/restore the stack frame in `__llvm_mip_call_counts_caller`. In our return address instrumentation code, we also use this helper function to pass the return address register to the runtime.

> `__llvm_mipmap` has these fields. I added an inline comment that -shared doesn't work.

Unfortunately, yes, it seems `-shared` does not work, but I don't know enough about it to have ideas for fixes at the moment.

>           .section        __llvm_mipmap,"aw", at progbits
>           .globl  _Z3fooPiS_$MAP
>           .p2align        3
>   _Z3fooPiS_$MAP:
>   .Lref2:
>     ### not sure why this is needed
>           .long   __start___llvm_mipraw-.Lref2    # Raw Section Start PC Offset
>   
>     ##### this does not link in -fpic -shared mode
>           .long   _Z3fooPiS_$RAW-.Lref2           # Raw Profile Symbol PC Offset
>   
>           .long   _Z3fooPiS_-.Lref2               # Function PC Offset
>           .long   .Lmip_func_end0-_Z3fooPiS_      # Function Size
>           .long   0x0                             # CFG Signature
>           .long   0                               # Non-entry Block Count
>           .long   10                              # Function Name Length
>           .ascii  "_Z3fooPiS_"

In the previous comment I describe these fields in detail.

> Now this patch series adds machine basic blocks instrumentation.
> I wonder what it can do while the regular IR instrumentation cannot.
>
> Machine basic block instrumentation has some awkward points.
> Semantic information is lost. The loop optimization definitely cannot be applied.
> If an IR basic block maps to multiple machine basic blocks, you need multiple increments for each MBB while with IR BB you may just need one (e.g. dominator).
> Edge profiling is tricky. Edge profiling requires splitting critical edges - it is not clear how you can do this after the machine basic block layout is finalized.

The benefit of instrumenting machine basic blocks is we can easily mark MBBs that were not executed as candidates for outlining. We can definitely apply Kirchoff's cirtuit law optimization to reduce the number of stores.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104060/new/

https://reviews.llvm.org/D104060



More information about the llvm-commits mailing list