[llvm-dev] Is BlockAddress always correct ?

Reid Kleckner via llvm-dev llvm-dev at lists.llvm.org
Fri Feb 28 10:54:15 PST 2020


In general, no, there is no way in LLVM IR to get the return address of a
single function call, which appears to be what you want. The compiler is
free to insert instructions at the end of the basic block and into the
beginning of the next block, so yes, the BlockAddress is always exact, but
it doesn't seem to be quite what you want. Something else that would break
your invariant, for example, is if the register allocator decided to spill
the return value in RAX right after the call, which is pretty typical.

There are several existing LLVM features that record function return
addresses, but it is not implementable in LLVM IR. For example, CodeView
debug info records heap allocation call sites. You can see this in the
assembly in this example, see the labels Ltmp3 and Ltmp5 etc at the return
addresses:

$ cat t.cpp
struct Foo {
  int x, y;
};
__declspec(allocator) Foo *newFoo();
void bar(Foo **foos) {
  foos[0] = newFoo();
  foos[1] = newFoo();
  foos[2] = newFoo();
  foos[3] = newFoo();
}

$ clang -cc1 -gcodeview  -masm-verbose -debug-info-kind=limited -triple
x86_64-windows-msvc -fms-extensions -S t.cpp  -o - | grep -A4
'S_HEAP\|call.*newFoo'
        callq   "?newFoo@@YAPEAUFoo@@XZ"
.Ltmp1:
        movq    32(%rsp), %rcx
        movq    %rax, (%rcx)
.Ltmp2:
--
        callq   "?newFoo@@YAPEAUFoo@@XZ"
.Ltmp3:
        movq    32(%rsp), %rcx
        movq    %rax, 8(%rcx)
.Ltmp4:
--
        callq   "?newFoo@@YAPEAUFoo@@XZ"
.Ltmp5:
        movq    32(%rsp), %rcx
        movq    %rax, 16(%rcx)
.Ltmp6:
--
        callq   "?newFoo@@YAPEAUFoo@@XZ"
.Ltmp7:
        movq    32(%rsp), %rcx
        movq    %rax, 24(%rcx)
        .cv_loc 0 1 10 0                # t.cpp:10:0
--
        .short  4446                    # Record kind: S_HEAPALLOCSITE
        .secrel32       .Ltmp0          # Call site offset
        .secidx .Ltmp0                  # Call site section index
        .short  .Ltmp1-.Ltmp0           # Call instruction length
        .long   4096                    # Type index
--
        .short  4446                    # Record kind: S_HEAPALLOCSITE
        .secrel32       .Ltmp2          # Call site offset
        .secidx .Ltmp2                  # Call site section index
        .short  .Ltmp3-.Ltmp2           # Call instruction length
        .long   4096                    # Type index
--
        .short  4446                    # Record kind: S_HEAPALLOCSITE
        .secrel32       .Ltmp4          # Call site offset
        .secidx .Ltmp4                  # Call site section index
        .short  .Ltmp5-.Ltmp4           # Call instruction length
        .long   4096                    # Type index
--
        .short  4446                    # Record kind: S_HEAPALLOCSITE
        .secrel32       .Ltmp6          # Call site offset
        .secidx .Ltmp6                  # Call site section index
        .short  .Ltmp7-.Ltmp6           # Call instruction length
        .long   4096                    # Type index

On Fri, Feb 28, 2020 at 5:57 AM PenYiWang via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hi
>
> I use BlockAddress to get the address of BasicBlock ,
>
> and I use GlobalVariable 's getInitializer()
>
> to pass the address of BasicBlock to the global variable of my own program
>
> and then I print it out.
>
> But , I found that  BlockAddress is not always correct.
>
> For example, some function's rsp (stack pointer) or other register is
> maintained by caller,
>
> so  it would be like:
> https://i.imgur.com/Rwuy5ju.png
>   0x42c37a: e8 c1 7a 00 00     call   433e40 <retrieve_url>
>   0x42c37f: 48 83 c4 20           add    rsp,0x20
>   0x42c383: eb 00                    jmp    42c385 <main+0x16b5>
>
> What I want is the basic block which is "excatly" after the function call
> , 0x42c37f
>
> I want BlockAddress give me  0x42c37f.
>
> But actually, the output my program print  out is   0x42c383.
>
> I guess "add    rsp,0x20" is seen as within the basic block  of the
> function call.
>
> Maybe reset the rsp (stack pointer) is part of the function call.
>
> Can I say there is bug in BlockAddress ?
>
> Or there is some bug in LLVM's backend?
>
> How to solve this problem?
>
> Force clang/llvm not to use caller-saved convention or something like that
> ?
>
> Thanks
>
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200228/47b3491d/attachment.html>


More information about the llvm-dev mailing list