Brain dump on type merging

Wed Dec 3 22:44:39 PST 2014

+llvmdev, so this gets more eyes.

These points make sense to me too.  In LTO, at least, sharing constants
and metadata in the context seems mostly to work against us.

I'm less aware of the tradeoffs re: the opaque pointer type, but your
brain dump is compelling.  I'd be interested in whether anyone thinks
the loss of type safety there is important.

> On 2014 Dec 3, at 21:00, Chandler Carruth <chandlerc at google.com> wrote:
> 
> Just want to go on record that:
> 
> 1) I completely agree that constants, types, and metadata should all be module owned. It makes so much sense. This becomes tremendously more appealing when (not if!) we make datalayout required and frontend-provided. Because then all of these things have deterministic always-available access to it, etc. Goodness. Pure goodness.
> 
> 2) I'm 100% behind moving to an opaque pointer type. Combined with mandatory datalayout, lots of nice things become possible. If we can trivially move back to structural type identity, I'm all for that. But I would do it in steps -- first get the opaque pointer, then try to improve type merging, etc.
> 
> On Wed, Dec 3, 2014 at 8:22 PM, Rafael Espíndola <rafael.espindola at gmail.com> wrote:
>> I have spent most of last week working on improving the type merging
>> during LTO as a side effect of trying to fix what looked like a simple
>> PR (pr21374).
>> 
>> I think I have committed all the work I have for the area for the
>> foreseeable future, but working on it did get me thinking about how we
>> handle types, so I want to write down some very long term ideas while
>> they are fresh.
>> 
>> The first thing that is odd is that types are owned by the context.
>> This requires hacks when we want to think of types "of a module" as we
>> do during linking. We have agreed to move metadata ownership from the
>> context to the modules (something I should get to in the near future).
>> I wonder if it would make sense to move constants and types too. Other
>> than avoiding the semantic mismatch, this would have other advantages:
>> 
>> * No leaks in LLVMContext (design ones at least). Once a module is
>> deleted, all data that one normally thinks as being part of the module
>> is gone. Right now a LLVMContext that is used for a sequence of
>> modules will slowly leak memory as metadata, constants and types
>> accumulate.
>> 
>> * Given a constant or type we would be able to get to the Module and
>> from there to the DataLayout, removing the need for the immutable pass
>> we have for it right now.
>> 
>> Other than the ownership, some other thoughts came to mind:
>> 
>> First, the fact that the linker merges types by name is incredibly
>> ugly. We assign no semantic meaning to names, but your fronted better
>> name types as we expect or lib/Linker will add a lot of casts to your
>> modules.
>> 
>> Second, the hard part (and I assume the slow part) of strict
>> structural equality (like what we had in 2.9) is the case where one
>> type is found to be equivalent to another. This is what llvm 2.9 had
>> forwarding pointers in the types for. It can cause a cascading effect
>> where multiple types are merged. This is now even harder since we
>> don't have the forwarding pointer anymore.
>> 
>> Comparing a SCC of types to existing SCCs during linking on the other
>> hand is relatively simple. I implemented a brute force approach in the
>> hope of replacing the name based merging, but it was not sufficient
>> because of opaque types.
>> 
>> The problem with opaque types is that we can link (resolve) them with
>> any other type, but which one we link them with can have dramatic
>> consequences to how many casts we have to introduce.
>> 
>> One approach would be to not resolve them during linking, but wait and
>> see which casts get created and them resolve in a way that would
>> remove the casts. This would be fairly expensive as we would have to
>> walk the entire IR to replace types (no RAUW for types).
>> 
>> This is why the name based type merging we have now is so important:
>> it provides an heuristic as to what type an opaque one should be
>> resolved to. Type merging of non-opaque types (even with cycles) can
>> be done without the heuristic.
>> 
>> This then brought to memory an idea that I have seen mentioned on
>> informal discussions but never on the list: Maybe we should have a
>> single pointer type instead of i8*, i32*, %foobar**, etc.
>> 
>> With a single pointer type we would be able to also drop opaque types
>> (since we always use a pointer to an opaque) and cycles (since they
>> have to go through a pointer). This will bring back structural type
>> equality, but without the std::multimaps and PATypeHolders.
>> 
>> The type would be transferred to what actually uses it: load, store,
>> gep. This would also help with cases where the FE have to introduce
>> casts just to make LLVM happy. One case that comes to mind is when a
>> class destructor is equivalent to a base class one. Currently clang
>> has to introduce casts because of the different types of the this
>> pointer.
>> 
>> Cheers,
>> Rafael