Brain dump on type merging

Wed Dec 3 20:22:59 PST 2014

I have spent most of last week working on improving the type merging
during LTO as a side effect of trying to fix what looked like a simple
PR (pr21374).

I think I have committed all the work I have for the area for the
foreseeable future, but working on it did get me thinking about how we
handle types, so I want to write down some very long term ideas while
they are fresh.

The first thing that is odd is that types are owned by the context.
This requires hacks when we want to think of types "of a module" as we
do during linking. We have agreed to move metadata ownership from the
context to the modules (something I should get to in the near future).
I wonder if it would make sense to move constants and types too. Other
than avoiding the semantic mismatch, this would have other advantages:

* No leaks in LLVMContext (design ones at least). Once a module is
deleted, all data that one normally thinks as being part of the module
is gone. Right now a LLVMContext that is used for a sequence of
modules will slowly leak memory as metadata, constants and types
accumulate.

* Given a constant or type we would be able to get to the Module and
from there to the DataLayout, removing the need for the immutable pass
we have for it right now.

Other than the ownership, some other thoughts came to mind:

First, the fact that the linker merges types by name is incredibly
ugly. We assign no semantic meaning to names, but your fronted better
name types as we expect or lib/Linker will add a lot of casts to your
modules.

Second, the hard part (and I assume the slow part) of strict
structural equality (like what we had in 2.9) is the case where one
type is found to be equivalent to another. This is what llvm 2.9 had
forwarding pointers in the types for. It can cause a cascading effect
where multiple types are merged. This is now even harder since we
don't have the forwarding pointer anymore.

Comparing a SCC of types to existing SCCs during linking on the other
hand is relatively simple. I implemented a brute force approach in the
hope of replacing the name based merging, but it was not sufficient
because of opaque types.

The problem with opaque types is that we can link (resolve) them with
any other type, but which one we link them with can have dramatic
consequences to how many casts we have to introduce.

One approach would be to not resolve them during linking, but wait and
see which casts get created and them resolve in a way that would
remove the casts. This would be fairly expensive as we would have to
walk the entire IR to replace types (no RAUW for types).

This is why the name based type merging we have now is so important:
it provides an heuristic as to what type an opaque one should be
resolved to. Type merging of non-opaque types (even with cycles) can
be done without the heuristic.

This then brought to memory an idea that I have seen mentioned on
informal discussions but never on the list: Maybe we should have a
single pointer type instead of i8*, i32*, %foobar**, etc.

With a single pointer type we would be able to also drop opaque types
(since we always use a pointer to an opaque) and cycles (since they
have to go through a pointer). This will bring back structural type
equality, but without the std::multimaps and PATypeHolders.

The type would be transferred to what actually uses it: load, store,
gep. This would also help with cases where the FE have to introduce
casts just to make LLVM happy. One case that comes to mind is when a
class destructor is equivalent to a base class one. Currently clang
has to introduce casts because of the different types of the this
pointer.

Cheers,
Rafael