[llvm-dev] [LLVMdev] Adding a stack probe function attribute

Andy Ayers via llvm-dev llvm-dev at lists.llvm.org
Mon Aug 17 10:46:21 PDT 2015


The stack pointer can't ever safely point beyond the valid range. I don't 100% know the exact reason but suspect the OS may verify this on context save/restore, which can happen unpredictably. This can be tricky to guarantee if you're modifying the stack pointer in flight without some extra math here and there, at which point you might as well use another register and defer the update like we do.

I'm also not sure what you mean by updating the stack pointer being "allowed" for x86-32 -- if you look at the implementation of __chkstk for x86-32, which we provide in sources with VS, it does not modify ESP in the loop -- it uses EAX to probe the stack.

[from] c:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\crt\src\intel\chkstk.asm

        mov     eax, esp                ; current TOS
        and     eax, not ( _PAGESIZE_ - 1) ; Round down to current page boundary

cs10:
        cmp     ecx, eax                ; Is new TOS
        jb      short cs20              ; in probed page?
        mov     eax, ecx                ; yes.
        pop     ecx
        xchg    esp, eax                ; update esp
        mov     eax, dword ptr [eax]    ; get return address
        mov     dword ptr [esp], eax    ; and put it at new TOS
        ret

; Find next lower page and probe
cs20:
        sub     eax, _PAGESIZE_         ; decrease by PAGESIZE
        test    dword ptr [eax],eax     ; probe page.
        jmp     short cs10

Another common pattern which we haven't optimized for yet is to probe and zero the newly grown stack region.

We have the same issue with epilogs that you saw. We've worked around it for now by generating epilogs first, but this seems a bit hacky. I'd like to see us fix this by delaying the inline expansion of a stack probe until after prolog/epilog generation. That also removes the need for us to update the prolog block in ::emitProlog, which is a bit clunky. But we haven't gotten around to this yet.

-----Original Message-----
From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of John Kåre Alsaker via llvm-dev
Sent: Sunday, August 16, 2015 11:26 AM
To: Reid Kleckner <rnk at google.com>
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] [LLVMdev] Adding a stack probe function attribute

I started to implement inlining of the stack probe function based on Microsoft's inlined stack probes in https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2fMicrosoft%2fllvm%2ftree%2fMS.&data=01%7c01%7candya%40microsoft.com%7c7f3665f51a1a45a612c008d2a668163e%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=pm63Kdg2E%2bprrwKZKpcGpQNEYjwBaY7%2beZrX8eYfyYo%3d

Do we know why the stack pointer cannot be updated in a loop (which results in ideal code)? I noticed that was commented in Microsoft's code.
I suspect this is due to debug or unwinding information, since it is allowed on Windows x86-32.

I ran into two issues while implementing this.

The epilog was inserted into the wrong basic block. This is because the basic blocks to insert epilogs into is calculated before emitPrologue is called.
I fixed this by adding the following code after the emitPrologue call in PEI::insertPrologEpilogCode:

  RestoreBlocks.clear();
  calculateSets(Fn);

This doesn't seem like a very nice solution. It's also unclear to me how Microsoft's branch handles this.

The other issue is with code generation. All the tests passed but the one were segmented stacks are used and stack probes are required.
I don't see what is actually wrong here, and would like some help.
Here is the output:

# After Prologue/Epilogue Insertion & Frame Finalization # Machine code for function test_large: Post SSA Frame Objects:
  fi#0: size=40000, align=4, at location [SP-40000]

BB#4:
        %R11<def> = LEA64r %RSP, 1, %noreg, -40040, %noreg
        CMP64rm %R11, %noreg, 1, %noreg, 40, %GS, %EFLAGS<imp-def>
        JA_1 <BB#0>, %EFLAGS<imp-use>
    Successors according to CFG: BB#3 BB#0

BB#3:
    Predecessors according to CFG: BB#4
        %R10<def> = MOV64ri 40040
        %R11<def> = MOV64ri 32
        CALL64pcrel32 <es:__morestack>, %RSP<imp-use>
        MORESTACK_RET
    Successors according to CFG: BB#0

BB#0: derived from LLVM BB %0
    Predecessors according to CFG: BB#3 BB#4
        %EAX<def> = MOV32ri 40040; flags: FrameSetup
        %RDX<def> = MOV64rr %RAX; flags: FrameSetup
        %RCX<def> = MOV64rr %RSP; flags: FrameSetup
    Successors according to CFG: BB#1

BB#1: derived from LLVM BB %0
    Predecessors according to CFG: BB#0 BB#1
        OR64mi8 %RCX, 1, %noreg, 0, %noreg, 0, %EFLAGS<imp-def>;
flags: FrameSetup
        %RCX<def,tied1> = SUB64ri32 %RCX<tied0>, 4096, %EFLAGS<imp-def>; flags: FrameSetup
        %RDX<def,tied1> = SUB64ri32 %RDX<tied0>, 4096, %EFLAGS<imp-def>; flags: FrameSetup
        JAE_1 <BB#1>, %EFLAGS<imp-use>; flags: FrameSetup
    Successors according to CFG: BB#2 BB#1

BB#2: derived from LLVM BB %0
    Predecessors according to CFG: BB#1
        %RSP<def,tied1> = SUB64rr %RSP<tied0>, %RAX, %EFLAGS<imp-def>;
flags: FrameSetup
        SEH_StackAlloc 40040; flags: FrameSetup
        SEH_EndPrologue; flags: FrameSetup
        %RCX<def> = LEA64r %RSP, 1, %noreg, 40, %noreg
        %EDX<def> = MOV32r0 %EFLAGS<imp-def,dead>
        CALL64pcrel32 <ga:@dummy_use>, <regmask>, %RSP<imp-use>, %RCX<imp-use>, %EDX<imp-use,kill>, %RSP<imp-def>
        SEH_Epilogue
        %RSP<def,tied1> = ADD64ri32 %RSP<tied0>, 40040, %EFLAGS<imp-def,dead>
        RETQ

# End machine code for function test_large.

*** Bad machine code: Using an undefined physical register ***
- function:    test_large
- basic block: BB#1  (0x40b1d70)
- instruction: OR64mi8- operand 0:   %RCX

*** Bad machine code: Using an undefined physical register ***
- function:    test_large
- basic block: BB#1  (0x40b1d70)
- instruction: %RCX<def,tied1> = SUB64ri32- operand 1:   %RCX<tied0>

*** Bad machine code: Using an undefined physical register ***
- function:    test_large
- basic block: BB#1  (0x40b1d70)
- instruction: %RDX<def,tied1> = SUB64ri32- operand 1:   %RDX<tied0>

*** Bad machine code: Using an undefined physical register ***
- function:    test_large
- basic block: BB#2  (0x40b1e20)
- instruction: %RSP<def,tied1> = SUB64rr- operand 2:   %RAX
LLVM ERROR: Found 4 machine code errors.
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org         https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fllvm.cs.uiuc.edu&data=01%7c01%7candya%40microsoft.com%7c7f3665f51a1a45a612c008d2a668163e%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=iQjl5g5NZHMOzJ4%2fhFEN6ECfAnFBR5Evud7KBAWO%2b7Y%3d
https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.llvm.org%2fcgi-bin%2fmailman%2flistinfo%2fllvm-dev%0a&data=01%7c01%7candya%40microsoft.com%7c7f3665f51a1a45a612c008d2a668163e%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=BiZb%2bhINDhjdc0YshtClHXKkOS10wY7Gjw6LWjzozP0%3d


More information about the llvm-dev mailing list