[PATCH] D53706: [RecursionStackElimination]: Pass to eliminate recursions

Tue Nov 20 08:38:38 PST 2018

marels added a comment.

I think there was some confusion in how the lists are managed.

>From code analyses the pass knows how wide the recursion is. Wide means the number of recursive calls within a the function. For example qsort would have 2. Traversing an Octtree calls itself 8 times.

Work-list
---------

When the code reaches the first of those n calls, all information is available and the call can be 'emulated' be adjusting some PHINode and
looping to the start of the function. This is basically the same as for Tail Call Elimination. However, before doing so information about the remaining n-1 calls need to be queued in the work-list.

While the allocation of the work-list items is done in chunks (array) of n-1 consecutive items, they are internally linked in single linked
list. Items are always added in chunk of n-1 at the front, and removed one by one from the front.

To clarify head pointer of the work-list always points to the next item to be processed and NOT to the current item. The information of the current arguments is extracted before branching to the loop-entry.

Because the allocation is done in chunks, when processing the last item, the address of the first item can be computed by an subtraction (in this our case this is done by a GEP instruction).

This is also the point where free list management comes into play. To reduce the amount of stack required unused chunks are stored within a free-list.

Free-List
---------

The free-list is also a linked list, but in contrast to the work-list it logical links chunks. As link pointer the last next pointer the array is used.

Whenever a new chunk is allocated it is first taken from the free list if there is one available. If the free-list is empty a new chunk is allocated by executing an alloca instruction.

List-Items
----------

To execute the algorithm each item stores the following information.

- Arguments: These are the function arguments that are necessary to execute the function.
- Next-Pointer: This is the link pointer to maintain the worklist.
- Marker: The marker used to mark the last item in a chunk. Whenever a marked item is removed from the work-list. The complete chunk has to be put into the free-list.

Example Work-List:

After a couple of steps the work-list might log like denoted in Fig. 1 while executing Step 3.3 for a recursion that with wideness 4 (e.g. Quad-Tree traversing). Note the Step 1 and 3.1 are omitted because the are never allocated within the work-list.

The first column denotes the Arguments; the second the Marker, and the third the next pointer.

  Fig. 1: Work-List and free-List while executing Step 3.3. Note that
  work-list already points to Step 3.4 even when currently executing
  Step 3.3.

  +----------+---+-----+
  | Step 2   |   | *   |
  +----------+---+ | --+       +----------+---+-----+
  | Step 3   |   | v * |       | Step 3.2 |   | *   |
  +----------+---+-- | |       +----------+---+ | --+
  | Step 4   | M | * v | <-\   | Step 3.3 |   | v * |
  +----------+---+-|---+   |   +----------+---+-- | +
  		 v	 |   | Step 3.4 | M | * v | <- [work-list]
  	      nullptr	 |   +----------+---+ | --+
                           |                    |
                           \--------------------/

                               +----------+---+-----+
  			     |          |   | *   | <- [free-list]
  			     +----------+---+ | --+
                               |          |   | v * |
  			     +----------+---+-- | +
  			     |          | M | * v |
  			     +----------+---+ | --+
  			                      v
  					   nullptr

Fig. 2 show the state of the maintained structures while executing Step 3.4. The changes are made just before branching to the loop entry.

  Fig. 2: Work-list and free-list while executing Step 3.4.

  +----------+---+-----+
  | Step 2   |   | *   |
  +----------+---+ | --+       +----------+---+-----+
  | Step 3   |   | v * |       |          |   | *   | <- [free-list]
  +----------+---+-- | |       +----------+---+ | --+
  | Step 4   | M | * v | <-\   |          |   | v * |
  +----------+---+-|---+   |   +----------+---+-- | +
  		 v	 |   |          | M | * v |
  	      nullptr	 |   +----------+---+ | --+
                           |                    |
                           \--- [work-list]     \-\
                                                  |
                               +----------+---+-- | +
  			     |          |   | * v |
  			     +----------+---+ | --+
                               |          |   | v * |
  			     +----------+---+-- | +
  			     |          | M | * v |
  			     +----------+---+ | --+
  			                      v
  					   nullptr

>From Fig. 1 and Fig. 2 one see the following:

1. The next-pointer of unmarked elements are constant. They are only assigned once when allocating a new chunk. Chunks within the free-list already contain the correct information.

2. Because of (1) the next pointer of unmarked items always point the a valid item. Thus a return check can be omitted for unmarked items.

3. The free list management is only necessary if a marked item is removed from the work-list.

Markers
-------

In order to check if free-list management is necessary a marker algorithm is executed. To determine the M field.

The next section list the algorithms that have been investigated so far:

**1) Field Marker**: Field markers maintain an explicit bit that stores the M flag in a separate field. An item is marked if the bit is set. The chunks look like this:

  struct chunk {
    struct item {
      struct { ... } Arguments;
      struct item *Next;
      bool Marker;
    } items[N-1];
  };

**2) Chunked Marker (by @john.brawn)**: Chunked Markers omit the next pointer and replace them by an index storing a reference to the next item (+1). The worklist always points to the next chunk in the queue. An marked item is reached iff when decrementing the index it becomes 0. A chunk looks like this.

  struct chunk {
    struct item {
      struct { ... } Arguments;
    } items[N-1];
    unsigned Index;
    struct chunk *Next.
  };

**3) Compare Marker**: The Compare Marker makes use of the order in which chunks are allocated. Depending on the stack growth direction the marking can be determined by executing (item < item->next) or (item > item->next).

  I don not go into the details but this works as long as the following condition holds when executing 2 allocas in a temporal order.

  Assume:

  a = alloca(X)
  b = alloca(Y)

  If X > 0 and Y > 0 then either (a < b) or (a > b) must hold.

However I think LLVMs alloca semantics (theoretically) might break this requirement.

A chunk looks like this:

  struct chunk {
    struct item {
      struct { ... } Arguments;
      struct item *Next;
    } items[N-1];
  };

**4) Tagged Marker A**: Tagged Marker use the same chunk layout as Compare Markers. The difference is that the marking is encoded within Bit 0 of pointers that point to the item. An item is marked it Bit 0 in the pointer is cleared.

  This works as long as all the alignment of each list item is 2 or a multiple of 2. And that if before dereferencing an item pointer Bit 0 is masked.

The return code for markers 1,2,3 and 4
---------------------------------------

Each return in the function is replaced by a branch to the following
return code.

  returnpath:
    if (!worklist)
      return;

    tie(marked, item, oldchunk) = marker->execute_and_advance(worklist);
    if (marked)
       marker->add_to_freelist(oldchunk);

    next_arguments = marker->advance_and_return_next_item(worklist);

    // execute H(...) here
    goto loop_entry;

**5) Tagged Marker B**: Beside the tagged marker each marker must dereference the work-list thus the return check must be executed first and for each element. Because tagged markers only need the pointer value itself, the return code for that tagged marker can be further optimized.

  returnpath_tagged:
    tie(marked, item, oldchunk) = marker->execute_and_advance(worklist);

    if (marked) {
      if (!worklist)
        return;

      marker->add_to_freelist(oldchunk);
    }

    next_arguments = marker->advance_and_return_next_item(worklist);

    // execute H(...) here
    goto loop_entry;

**6) Always True Marker**: This is a trivial marker which is always (and implicitly true). This marker applies to function with wideness 2 only. It can be used with the tagged return path.

**7) Always False Marker**: This trivial marker disables free-lists at all. It must be with the untagged return path.

Repository:
  rL LLVM

https://reviews.llvm.org/D53706