[PATCH] D78793: Names for structs are held on the Context, not the Module. Move getTypeByName from Module to Type taking a Context parameter.

Mon Nov 30 15:06:50 PST 2020

nicholas added a comment.

In D78793#2424089 <https://reviews.llvm.org/D78793#2424089>, @dexonsmith wrote:

> In D78793#2424081 <https://reviews.llvm.org/D78793#2424081>, @nicholas wrote:
>
>>> ! In D78793#2424013 <https://reviews.llvm.org/D78793#2424013>, @dexonsmith wrote:
>>>  In particular, it would be nice if LLVMContext were restricted to types / values / metadata that are purely functional, and anything non-functional or open to interpretation is put on the Module.
>>>
>>> - For values and metadata, this requires some significant work (I have some concrete ideas of how, but it seems quite tangential to this patch).
>>> - For types, this could be done as a very simple follow-up to the opaque pointer work (@dblaikie @arsenm, not sure if you have thoughts). The `Type` class would be purely functional, and the struct names would move to a symbol table in each `Module`. The name is not inherent to the type, it's just a handle that some module wants to use to refer to it (the type itself can be shared between modules even if they disagree about what to call it).
>>
>> I don't have much to add here, I'd simply recommend that you have a plan for dealing with the problems of type merging and refinement, structs whose members are pointers to that same struct (detect and reject creating a struct that holds itself as a member, `%0 = type { \0 }`), etc., before making the change.
>>
>> I'm not familiar with the opaque pointer work and how it affects the types and context.
>
> Yeah, that's exactly what's fixed by the opaque pointer work: it turns the type graph into a DAG. In your example, the type wasn't holding itself as a member (that's not possible), it was holding a pointer to itself as a member. Once pointers are opaque, then it's just holding a pointer, no cycles anymore.

My initial reaction, that sounds like a great proposal!

Let me just think out loud, so to type. It's been a while since I last thought hard about the consequences of changes to the LLVM IR. Back In the 2.x series, LLVM had an `OpaqueType` that could take the place of any type. It was something that might be resolved as part of linking, and this made module linking pretty complicated as opaques got resolved (not to mention the rest of LLVM had to deal with opaques though really except it didn't really inside function bodies). `OpaqueType` was restricted to `StructType` by 2.9 in preparation for the 3.0 change. I want to make sure that having a replacement for `OpaqueType` that's restricted to pointers doesn't reintroduce most of the same problems. After all, pointers aren't the only type that have elements, why not array types? Array types can have struct elements and struct elements can have array types, so we'll still need to do a recursive check that a type is acyclic when constructed. That doesn't sound too bad, though I think it will have to be O(n^2) unless you can get it in the right data representation to use Tarjan's algorithm which will probably be tricky. It seems that the big difference here (vs. LLVM 2.9) is that we can have opaque pointers propagated through the whole IR including all the instructions that operate on pointers. Then when linking we match up types by structure and we never have to do any type refinement. The remaining challenge here is forward declared structs, we want those to match up by name during module linking, but is there any reason that can't be handled the same as current LLVM does it? Same name in two modules, if both have bodies one gets renamed, otherwise they're the same type. The name can then live on the module and not the context. `StructType::setBody` can still be used to take a struct from body-less to one with a body, though it still has to check for non-pointer recursive types on construction. Ah, but you never need to because the only way to get a legal cycle is through a pointer anyhow! So you still need empty-body structs for the forward declaration case (as in, the original C code used a forward-declared struct -- note that `struct Foo; struct Foo f();` returning a non-pointer forward declared struct is valid in C) to handle module linking, but that's the only case, in all other cases you can pass in all fields at construction time. So you don't really need `setBody` at all, the module linker just has to change the types from the opaque one to the one with a body when moving the values.

Ok, I like this plan. Best of luck!

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78793/new/

https://reviews.llvm.org/D78793