[PATCH] IR: Add COMDATs to the IR

Fri Jun 13 13:07:42 PDT 2014

>> An important difference between COFF and ELF is that in COFF only one section can be in a COMDAT. In ELF, multiple sections can be in the same comdat.
>>
>> I noticed that you left associative out of the above list. Is the intention to write code that look like ELF (multiple sections in on COMDAT) but at codegen we pick one to actually be in that comdat and make others associative with it?
>>
>> It is probably a good idea to mention this, since the documentation gives the impression that adding COMDAT to a global makes sure that it will be in that COMDAT.
>>
>> What are we doing about linkage? For COMDATs to work, we have to make sure their contents are not modified. For example, if a COMDAT has two linkonce_odr symbols, dropping only one can cause a llink failure.
>>
>> One way would be to say that isDiscardableIfUnused symbols are not allowed, but we do want to allow private. Going this way would require spiting the "can be dropped" notion out of linkage. For example, linkonce would become "weak dropable" and a plain private would have to be kept. We do want to allow entire COMDATs to be dropped, so the comdat itself would probably need the dropable bit too.
>>
>> A simpler but potentially brittle option is to just say that
>> * A symbol in a comdat cannot be dropped individually
>> * A comdat can be dropped if every symbol in it can be dropped.
>>
>> And audit pretty much every call to isDiscardableIfUnused.
> Consider:
>   $v = comdat any
>   @v = global i32 0, comdat $v
>   @m = global i32 0, comdat $v
>
> In the ELF case, `@v` will be selected as the COMDAT key. The .data.v and .data.m sections will both be in the COMDAT group.
>
> In the COFF case, `@v`'s .data section will be the primary COMDAT section. `@m`'s .data section will be associated with it.
>
> Because we have this behavior, we do not need an explicit associative COMDAT selection kind.

This in nice. It is probably a good idea to document it.

> ----------------
> Rafael Ávila de Espíndola wrote:
>> Why do you need it to be a Value? This says that a Comdat has a type, which seems wrong.
> It is a `Value` so that it can participate in things like use lists.
>
> As an example why this is useful:
> If we want to drop a symbol in a COMDAT group, we need to prove we don't need any of the symbols in that group.  Iterating over users of the COMDAT is a straightforward way of doing this.
>
> I could recreate this functionality just for COMDATs but it seemed like the right way to go.

We had problems in the past with a Class being too generic
(GlobalAlias being the most recent one), so I would suggest making it
specific.All you need to is add a "Use*" to the comdat class, no?

> ================
> Comment at: include/llvm/IR/Comdat.h:51
> @@ +50,3 @@
> +  /// Methods for support type inquiry through isa, cast, and dyn_cast:
> +  static inline bool classof(const Value *V) {
> +    return V->getValueID() == Value::ComdatVal;
> ----------------
> Rafael Ávila de Espíndola wrote:
>> What is this needed for?
> There are places, like `ConvertValIDToValue`, where we get a `Comdat` from a `Value`.

Well, that seems like a problem to cover another. It shouldn't be
parsed as a Value :-(

> ================
> Comment at: include/llvm/IR/Module.h:599
> @@ +598,3 @@
> +  comdat_iterator       comdat_begin()          { return ComdatList.begin(); }
> +  const_comdat_iterator comdat_begin() const    { return ComdatList.begin(); }
> +  comdat_iterator       comdat_end  ()          { return ComdatList.end();   }
> ----------------
> Rafael Ávila de Espíndola wrote:
>> Is this need? I looks a bit strange say that comdat is "part of a module". They are really a composite property of the globals.
> A `Comdat` value has to live in *some* symbol table.  Considering the tight relationship between `Comdat` value names and `GlobalValue` names, it makes sense for the symbol table to be at `Module` scope.

Sure. It just feels a bit more like constants or types. They have to
live somewhere, but it is more of an implementation detail where they
live. Conceptually a module has GlobalValues, triple, datalayout and
information associated with the GlobalValues (metadata, comdat,
types).

> ================
> Comment at: lib/AsmParser/LLParser.cpp:539
> @@ +538,3 @@
> +  case lltok::kw_exactmatch:   SK = Comdat::ExactMatch;   break;
> +  case lltok::kw_largest:      SK = Comdat::Largest;      break;
> +  case lltok::kw_newest:       SK = Comdat::Newest;       break;
> ----------------
> Rafael Ávila de Espíndola wrote:
>> Does clang-format agrees with having the break in the same line?
> I matched the style in `ParseFnAttributeValuePairs`, I can reformat this to be in `clang-format` style if you wish.

clang-format please :-)

> ================
> Comment at: lib/AsmParser/LLParser.cpp:1161
> @@ +1160,3 @@
> +
> +/// GetComdatVal - Get a Comdat with the specified name, creating a forward
> +/// reference record if needed.
> ----------------
> Rafael Ávila de Espíndola wrote:
>> Don't repeat the name in the comment. Functions start with a lowecase letter.
> I'll switch to `\brief`.
>
> With regard to the naming style, I followed the [[ http://llvm.org/docs/CodingStandards.html#golden-rule | Golden Rule ]]

The annoying thing is that parts of the code base end up never being
fixed. What about fixing the old code in a preparation commit?

> ================
> Comment at: lib/IR/Globals.cpp:115
> @@ +114,3 @@
> +  if (isa<GlobalVariable>(this) || isa<GlobalAlias>(this)) {
> +    // Comdats on declarations are semantically meaningless, ignore them.
> +    if (isDeclaration())
> ----------------
> Rafael Ávila de Espíndola wrote:
>> Why not reject instead? This probably indicates a bug in the calling code, no?
> It's annoying to avoid this from the `AsmParser`. I'll add a verifier check.

You probably need both. We want the verifier to be sure someone using
the c++ api gets notified of the error. You need a parser check to
make sure llvm-as can provide a better error message.

Cheers,
Rafael