[PATCH] D53706: [RecursionStackElimination]: Pass to eliminate recursions

Martin Elshuber via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Oct 25 08:26:38 PDT 2018


marels created this revision.
marels added reviewers: john.brawn, t.p.northover, asl, chandlerc.
Herald added subscribers: llvm-commits, kristof.beyls, tpr, javed.absar, mgorny, mehdi_amini.
Herald added a reviewer: deadalnix.

This pass converts multiple tail recursive calls into a loop, by modeling the
calls as a single linked worklist explicitly on the stack.

  void f(a, b, const c) {
    ...
    if (...)
      return
  
    a1, b1, b2, b3, x2, x3 = ...
  
    f(a1, b1, c) // 1st recursion
    a2 = h(x2)
    f(a2, b2, c) // 2nd recursion
    a3 = h(x3)
    f(a3, b3, c) // 3rd recursion
  }

transforms to

  void f(a, b, const c) {
    struct { x, b, next } *worklist = null
  
  loop:
    ...
    if (...)
      goto check_return
  
    a1, b1, b2, b3, x2, x3 = ...
  
    /* Assign arguments for the first call */
  
    a = a1
    b = b1
  
    /* Put arguments of the remaining calls into the work list */
  
    queue = alloca(2 * sizeof(*worklist))
    queue[0] = {x2, b2, &queue[1]}
    queue[1] = {x3, b3, worklist}
    worklist = queue
  
    goto loop
  
  check_return:
    if (!worklist)
      return
    a = h(worklist->x)
    b = worklist->b
    worklist = worklist->next
  
    goto loop
  }

Such patterns occur for example when an application traverses k-ary full trees.

The benefits of this transformation is, that neither frame nor link address
have to be stored on the stack. Also pass through arguments, like 'const c'
in example above, are less likely required to saved onto the stack, and if so
less often (only once in the entry block).

The downsides are:

a) An additional store for the worklist is required.

b) The worst case required stack memory after this transformation depends on

  the number of recursive calls, instead of the recursion depth.

c) Additional conditional branches

ad a) This point is compensated by avoiding storing the link address in each

  call.

ad b) This pass additionally can add (and does it by default) code to manage

  unused blocks of allocas in a freelist. Because allocations are done in
  blocks the freelist management, can also be done in blocks. This requires
  the last item in each block to be marked, detectable by the code within
  return. Currently the following mark algorithms are implemented:
  
  A further variable is stored on the stack containing the current
  freelist. On the other hand by not executing a recursion the frame pointer
  loads and stores are omitted.
  
  * TrueMarker: This algorithm marks each element and is applicable to binary
    recursions.
  
  * FalseMarker: This algorithm marks no element, and is applicable when
    freelist management is disabled.
  
  * FieldMarker: A separate bit is allocated in the worklist. This marker
    requires additional instructions but can be used in the general case.
  
  * CompareMarker: Assuming allocas return memory addresses in a strictly
    monotonic order. The freelist can be modeled to return the same order
    when pulling elements from it. Comparing each worklist with worklist.next
    can then reveal the information if the element is marked.
  
  * TaggedMarker: (not yet implemented) Similar to the FieldMarker this
    marker marks the last item by a bit. But instead of using a separate bit
    it uses Bit 0 of the worklist field. If the alignment of the worklist is
    a power of 2, and if it is >= 2, this marker can also cover the general
    case. It requires some additional bit masking but no additional memory
    operations.

ad c) The pass adds 1 conditional branch into the return path and 2

  additional branches for freelist management (see (b) above). Depending on
  the target, machine branch prediction can elevate this.

Algorithm outline:
------------------

Analysis Phase:

1. Analyze the function and gather the returning basic blocks (BB) (and BB branching to return-only BB) with recursive calls (RC).

2. If more the one BB or no RC is found abandon the transformation.

3. If the remaining BB has only one RC abandon the transformation. Otherwise let N be number of RC in this BB.

4. Analyze the instructions in this BB and from the first RC until the terminator instruction and classify each instruction as movable and static.

  A movable instruction is and instruction that can be safely moved before the first RC. All other instructions are classified static.

5. Assign each static instruction to the following RC instruction. If static instructions are left after the last RC abandon the transformation.

6. Build the function H with all its arguments for each RC. By including the call itself in function H it is ensured that this function and enforcing this function to return void. It is ensured that there are no escaping values uses after the recursion.

6.1) Note: By the way step 4 is executed it is guaranteed that function H for

  the first RC consists of a single instruction; the call itself. The first
  call candidate is handled special (the same way as in Tail Recursion
  Elimination (TRE)).

5,6) Note: The information collected on each RC is collected in the structure

  RecursiveCall.

7. Compare the second's function H with all later ones. The behavior must match, otherwise abandon the transformation.

7.1) Note: As the first RCs function H basically a TRE it can be ignored in this

  step.

8. Simplify the argument list by removing constants and pass through arguments.

9. Decide whether it is profitable to use the transformation.

Transformation Phase:

1. Adjust entry block and split of the loop entry.

2. Eliminate the first RC (similar to TRE).

3. Eliminate the remaining RC by allocating and filling an array new (or pick it from the freelist) block of N-1 struct items. This array is put in the front of the list. Pulling the list is added in (4). The execution of function H is ensured in (5).

4. Create a new return block which pulls items from the worklist. If an and of block marker is reached. The block is put into the freelist.

5. Add the instruction from function H into the return block and create the loop.

6. Redirect each returning block into the new return block created in (4).

7. Drop the constant STACKGROWTHDIRECION. It is manly uses as a proof of concept for Aarch64.

Open issues and known TODOs - It would be great if reviewer could comment on
those as well:

1. Pipeline integration: Currently the pass is put before TRE. This includes some supporting passes, for cleanup and preparation. This cannot be left as is. The preferred way could be by adjusting "AArch64TargetMachine::adjustPassManager" and only use it with https://reviews.llvm.org/owners/package/3/. AAarch64 is selected, because this is the architecture the author has suitable benchmarking and test setups available. This was tried once (in a similar way as in AMDGPUTargetMachine::adjustPassManager), however the result was that may LLVM projects did not compile anymore because of linker problems (Passes library was missing). Do you have any advise here?

2. The way to test if it profitable to use the pass needs adjustment. I think that functions, that spill more registers have an increased chance to profit, while functions that spill less, have lower chance for profit.

3. Thinking of a configurable way (maybe a separate marker class) to adjust the way markers implemented. E.g.: putting the marker bit into the pointer on Aarch64 show a significant performance boost (test implemented in C).

4. Is it safe to temporary create function and return no-changes after deleting it? If not is there a better way to than calling using the FunctionComparator?

5. GlobalsAA needs to preserved. Not sure about this in this context. Loads and Stores are added here.


Repository:
  rL LLVM

https://reviews.llvm.org/D53706

Files:
  include/llvm-c/Transforms/Scalar.h
  include/llvm/InitializePasses.h
  include/llvm/Transforms/Scalar.h
  include/llvm/Transforms/Scalar/RecursionStackElimination.h
  lib/Passes/PassBuilder.cpp
  lib/Transforms/IPO/PassManagerBuilder.cpp
  lib/Transforms/Scalar/CMakeLists.txt
  lib/Transforms/Scalar/RecursionStackElimination.cpp
  lib/Transforms/Scalar/Scalar.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D53706.171096.patch
Type: text/x-patch
Size: 86871 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20181025/a8496ec6/attachment.bin>


More information about the llvm-commits mailing list