[llvm-dev] Is BlockAddress always correct ?

PenYiWang via llvm-dev llvm-dev at lists.llvm.org
Sat Feb 29 03:42:11 PST 2020


Thank for your explanation !!
I got it !!

Besides for caller saved register.
Actually, I also found that some compiler optimization would break the
BlockAddress in llvm backend.
some test cases are correct in -O0 flag , and be wrong in -O2 flag.

Now I know that BlockAddress is not reliable.

 Thank you again~

Reid Kleckner <rnk at google.com> 於 2020年2月29日 週六 上午2:54寫道:

> In general, no, there is no way in LLVM IR to get the return address of a
> single function call, which appears to be what you want. The compiler is
> free to insert instructions at the end of the basic block and into the
> beginning of the next block, so yes, the BlockAddress is always exact, but
> it doesn't seem to be quite what you want. Something else that would break
> your invariant, for example, is if the register allocator decided to spill
> the return value in RAX right after the call, which is pretty typical.
>
> There are several existing LLVM features that record function return
> addresses, but it is not implementable in LLVM IR. For example, CodeView
> debug info records heap allocation call sites. You can see this in the
> assembly in this example, see the labels Ltmp3 and Ltmp5 etc at the return
> addresses:
>
> $ cat t.cpp
> struct Foo {
>   int x, y;
> };
> __declspec(allocator) Foo *newFoo();
> void bar(Foo **foos) {
>   foos[0] = newFoo();
>   foos[1] = newFoo();
>   foos[2] = newFoo();
>   foos[3] = newFoo();
> }
>
> $ clang -cc1 -gcodeview  -masm-verbose -debug-info-kind=limited -triple
> x86_64-windows-msvc -fms-extensions -S t.cpp  -o - | grep -A4
> 'S_HEAP\|call.*newFoo'
>         callq   "?newFoo@@YAPEAUFoo@@XZ"
> .Ltmp1:
>         movq    32(%rsp), %rcx
>         movq    %rax, (%rcx)
> .Ltmp2:
> --
>         callq   "?newFoo@@YAPEAUFoo@@XZ"
> .Ltmp3:
>         movq    32(%rsp), %rcx
>         movq    %rax, 8(%rcx)
> .Ltmp4:
> --
>         callq   "?newFoo@@YAPEAUFoo@@XZ"
> .Ltmp5:
>         movq    32(%rsp), %rcx
>         movq    %rax, 16(%rcx)
> .Ltmp6:
> --
>         callq   "?newFoo@@YAPEAUFoo@@XZ"
> .Ltmp7:
>         movq    32(%rsp), %rcx
>         movq    %rax, 24(%rcx)
>         .cv_loc 0 1 10 0                # t.cpp:10:0
> --
>         .short  4446                    # Record kind: S_HEAPALLOCSITE
>         .secrel32       .Ltmp0          # Call site offset
>         .secidx .Ltmp0                  # Call site section index
>         .short  .Ltmp1-.Ltmp0           # Call instruction length
>         .long   4096                    # Type index
> --
>         .short  4446                    # Record kind: S_HEAPALLOCSITE
>         .secrel32       .Ltmp2          # Call site offset
>         .secidx .Ltmp2                  # Call site section index
>         .short  .Ltmp3-.Ltmp2           # Call instruction length
>         .long   4096                    # Type index
> --
>         .short  4446                    # Record kind: S_HEAPALLOCSITE
>         .secrel32       .Ltmp4          # Call site offset
>         .secidx .Ltmp4                  # Call site section index
>         .short  .Ltmp5-.Ltmp4           # Call instruction length
>         .long   4096                    # Type index
> --
>         .short  4446                    # Record kind: S_HEAPALLOCSITE
>         .secrel32       .Ltmp6          # Call site offset
>         .secidx .Ltmp6                  # Call site section index
>         .short  .Ltmp7-.Ltmp6           # Call instruction length
>         .long   4096                    # Type index
>
> On Fri, Feb 28, 2020 at 5:57 AM PenYiWang via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi
>>
>> I use BlockAddress to get the address of BasicBlock ,
>>
>> and I use GlobalVariable 's getInitializer()
>>
>> to pass the address of BasicBlock to the global variable of my own program
>>
>> and then I print it out.
>>
>> But , I found that  BlockAddress is not always correct.
>>
>> For example, some function's rsp (stack pointer) or other register is
>> maintained by caller,
>>
>> so  it would be like:
>> https://i.imgur.com/Rwuy5ju.png
>>   0x42c37a: e8 c1 7a 00 00     call   433e40 <retrieve_url>
>>   0x42c37f: 48 83 c4 20           add    rsp,0x20
>>   0x42c383: eb 00                    jmp    42c385 <main+0x16b5>
>>
>> What I want is the basic block which is "excatly" after the function call
>> , 0x42c37f
>>
>> I want BlockAddress give me  0x42c37f.
>>
>> But actually, the output my program print  out is   0x42c383.
>>
>> I guess "add    rsp,0x20" is seen as within the basic block  of the
>> function call.
>>
>> Maybe reset the rsp (stack pointer) is part of the function call.
>>
>> Can I say there is bug in BlockAddress ?
>>
>> Or there is some bug in LLVM's backend?
>>
>> How to solve this problem?
>>
>> Force clang/llvm not to use caller-saved convention or something like
>> that ?
>>
>> Thanks
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200229/98759a76/attachment.html>


More information about the llvm-dev mailing list