[llvm-dev] ilist/iplist are broken (maybe I'll fix them?)

Wed Oct 7 17:57:35 PDT 2015

I've been digging into some undefined behaviour stemming from how ilist
is typically configured.  r247937, r247944, and r247978 caused a UBSan
failure to start firing on our Green Dragon bots, and after an IRC
conversation between David and Nick and Mehdi, we added a blacklist:
--
$echo "src:$WORKSPACE/llvm/include/llvm/CodeGen/MachineFunction.h" >> sanitize.blacklist
--

ilist/iplist is a pretty strange list, and the more I dig into it (to
fix the UB) the more broken I think it is.

I want to change a few things about it, but it'll be somewhat
intrusive (pun not really intended), so I want to get some buy-in before
really diving in.  I've CC'ed the people in the IRC conversation and a
couple of others that seem to care about ADT and/or UB.

"Normal" intrusive lists
========================

First, here's a simple ("normal") intrusive doubly-linked list:

    struct ListNodeBase {
      ListNodeBase *next = nullptr;
      ListNodeBase *prev = nullptr;
    };
    struct ListBase {
      struct iterator {
        ListNodeBase *I = nullptr;
        iterator &operator++() { I = I->next; return *this; }
        iterator &operator--() { I = I->prev; return *this; }
        ListNodeBase &operator*() const { return *I; }
      };

      ListNodeBase L;
      ListBase() { clear(); }
      void clear() { L.next = L.prev = &L; }
      iterator begin() { return iterator{L.next}; }
      iterator end() { return iterator{&L}; }
      bool empty() const { return begin() == end(); }
      void insert(iterator P, ListNodeBase &N) {
        assert(!N.next && !N.prev);
        N.next = &*P;
        N.prev = P->prev;
        N.next->prev = &N;
        N.prev->next = &N;
      }
      void erase(iterator N) {
        assert(N != end());
        N->next->prev = N->prev;
        N->prev->next = N->next;
        N->next = N->prev = nullptr;
      }
    };
    template <class T> class List : ListBase {
    public:
      class iterator : ListBase::iterator {
        friend class List;
        typedef ListBase::iterator Super;
        iterator &operator++() { Super::operator++(); return *this; }
        iterator &operator--() { Super::operator--(); return *this; }
        T &operator*() const { static_cast<T &>(Super::operator*()); }
      };

      using ListBase::clear;
      using ListBase::empty;
      iterator begin() { return iterator(ListBase::begin()); }
      iterator end() { return iterator(ListBase::end()); }
      void insert(iterator P, ListNodeBase &N) { ListBase::insert(P, N); }
      void erase(iterator N) { ListBase::erase(N); }
    };

In case it's not clear, to use `List<SomeType>`, `SomeType` has to
inherit from `ListNodeBase`.

There are a few nice properties here.

 1. Traversal logic requires no knowledge of the downcast nodes.
 2. `List<T>` owns none of the nodes (clear ownership semantics).
 3. There are no branches (outside of asserts).
 4. We never touch the heap.

As a result it's fairly simple.

It's also fairly easy to wrap it to provide extra features if desired
(e.g., adding ownership semantics, or the 32 variations of `insert()`
that we seem to like having).

Our ilist/iplist
================

Our ilist/iplist implementation is far more complicated, has none of the
above properties (except sometimes #4, and only with extra configuration
that exposes UB), and provides a few extra features that I don't think
are really worth paying for.

The default configuration effectively does the following:

    template <class T> ListNode { T *prev, T *next; };
    template <class T> struct List {
      T *Head = nullptr;
      void ensureHead() {
        if (!Head) {
          Head = new T();
          Head->next = nullptr;
          Head->prev = Head;
        }
      }
      // complex insertion/removal logic.
    };

Every list operation (even begin()/end()) starts by somehow calling
`ensureHead()`.

The key structural differences here:

  - Instead of the list containing the sentinel, it contains a pointer
    to the head of the list, and the sentinel is created on demand.
  - The sentinel is the full `T` (rather than ListNodeBase).
  - While "prev" pointers are circular, the sentinel's "next" pointer
    points at `nullptr` (it's "snipped off").
  - All pointers are to the downcast type, `T`.

(If you look at the code, you'll see it's a little more complicated.
There are also a number of arbitrary "hooks" for managing external
state.  Yay?)

Benefits
--------

What do we get in return?

  - Nodes "know" if they are the sentinel, since `next == nullptr`.
    This lets you do cute things like `NodeTy *getNextNode()` and
    `NodeTy *getPrevNode()`, which do the obvious, returning nullptr
    instead of the sentinel.  A naive list would need to compare against
    `end()`.
  - Iterating forward will eventually dereference `nullptr` (can't
    infinite loop, even in release builds).
  - A whole lot of branches :/.

UB from sentinel hack
---------------------

The memory management is particularly wasteful, so it's not surprising
that almost every user overrides it.  They effectively do this:

    template <class T> ListNode { T *prev, *next; };
    template <class T> struct List {
      ListNode<T> Sentinel;
      T *Head = static_cast<T *>(&Sentinel);
      void ensureHead() {}
    };

Because of iterators have pointers to `T` instead of `ListNodeBase` --
this exposes UB.  It's *probably* harmless?

More UB, broken code from half-node hack
----------------------------------------

It gets worse though :(.

This still wastes a pointer compared to the naive list.  Many uses of
iplist/ilist make a further optimization to win back that pointer,
splitting `ListNode<T>` into two, and using only half the node for the
sentinel.

    template <class T> struct ListHalfNode { T *prev; };
    template <class T> struct ListNode : ListHalfNode<T> { T *next; };
    template <class T> struct List {
      ListHalfNode<T> Sentinel;
      T *Head = static_cast<T *>(&Sentinel);
      void ensureHead() {}
    };

Besides exposing the same UB as above, this drops the "next" pointer
from the sentinel.  This is cute, but it's horribly wrong.

In particular, the getPrevNode/getNextNode() methods, which use
`N->next == nullptr` to check for sentinels, are instead poking into
"Head" in the list itself (which is never null).  For these lists,
`getNextNode()` will never return `nullptr`, because the "snipped" head
has been "reattached".

Ironically, this means we now have a "naive" doubly-linked circular list
in memory, but we have a whack-ton of logic that has no idea.  And we
have broken API.

What's broken?
--------------

Unless I'm misreading the code horribly,
BasicBlock/Instruction/Argument::getPrevNode/getNextNode() will *never*
return `nullptr`.  Instead they'll just return the sentinel.  Which has
been `static_cast`ed to a completely different type.  And if someone
dereferences it, *bang*.  There are other cases, too, but these are the
obviously scary ones.

Why am I writing this email instead of just fixing it?
======================================================

I don't feel totally comfortable just going ahead and fixing this
without buy-in, since it'll touch some things.  Here's my basic plan:

  - Fix `getNextNode()` and `getPrevNode()` by taking in the parent list
    and checking against `end()` instead of checking for `nullptr`.
  - Change the lists to be normal circular linked lists (possibly I'll
    stick an "is-sentinel" bit in the prev pointer in asserts builds so
    we can have assertions instead of infinite loops).  I'll layer our
    weird partial ownership semantics on top to keep this part NFC.
  - Re-layer the external state hooks so that they're on top of the
    basic list instead of inside of it (so that someone in the future
    can understand the code with far less studying).

Anyone want to back me on this before I dive in?  Even better, did I
miss something fundamental?