[Patch][pr19848] Produce explicit comdats in clang

Rafael EspĂ­ndola rafael.espindola at gmail.com
Mon Dec 22 04:44:08 PST 2014


Sorry for being late on my own thread. Trying to reply to most comments at once:

* I am pretty sure that we want to split the "in a comdat" notion from
the linkage. I am not married to how this patch does it.
* On MachO. This should actually improve things. A plain linkonce_odr
will then mean the same thing on ELF and MachO and a comdat in MachO
gets rejected early.
* One Idea I had was having a selfcomdat syntactic sugar in the .ll,
but I think I like rnk's suggestion better: just omit the $value (and
flipping the order in the case of globals).
* For the idea of using external instead of linkonce linkage. I agree
with David. We should be making these things independent. Keeping the
linkonce_odr/weak_odr is useful for the 'odr' bit, for saying the
symbol should be STB_WEAK and for the can/cannot be dropped
distinction.
* I like the idea of optimizing the case of a select any that has the
same name as an existing symbol, but it is not that simple in cases
like

$v = comdat any
@v = global i32 42 comdat $v
@bar = global i32 42 comdat $v

When we are building bar, how do we find if there is a "simple comdat
$v"? We would have to first lookup a GV named $v. When building @bar
first and @v second we would still need to look at the "normal comdat
table" when building $v to see if a comdat was already there.

Since the main saving is on the symbol names, another idea would be to replace

typedef StringMap<Comdat> ComdatSymTabType;

With

typedef DenseMap<const char *, Comdat> ComdatSymTabType;

And have an extra StringMap with the non-symbol names like C5/D5 (or
even introduce name -> null mappings in the gv symbol table?).

But it would have the main constraint as the full optimization: when
dropping the comdat, We have to be sure to drop $v last or we might
end up we with a pointer to garbage.

So, how about splitting work in:

* llvm: Implement the syntatic change for comdats with the same name
as the GV, but keep the current representation.
* clang: this patch, but with less test change noise.
* llvm: stop producing implicit comdats.
* llvm: implement and benchmark one of the possible optimizations.


On 20 December 2014 at 19:13, David Majnemer <david.majnemer at gmail.com> wrote:
>
>
> On Fri, Dec 19, 2014 at 5:43 PM, Richard Smith <richard at metafoo.co.uk>
> wrote:
>>
>> On Fri, Dec 19, 2014 at 12:51 PM, Rafael EspĂ­ndola
>> <rafael.espindola at gmail.com> wrote:
>>>
>>> There is quiet a bit of history behind this.
>>>
>>> The llvm IR until very recently had no support for comdats. This was a
>>> problem when targeting C++ on ELF/COFF as just using weak linkage
>>> would cause quiet a bit of dead bits to remain on the executable
>>> (unless -ffunction-sections, -fdata-sections and --gc-sections were
>>> used).
>>>
>>> To fix the problem, llvm's codegen will just assume that any weak or
>>> linkonce that is not in an explicit comdat should be output in one
>>> with the same name as the global.
>>>
>>> This unfortunately breaks cases like pr19848 where a weak symbol is
>>> not expected to be part of any comdat.
>>>
>>> Now that we have explicit comdats in the IR, we can finally get both
>>> cases right.
>>>
>>> This first patch just makes clang give explicit comdats to
>>> GlobalValues where it is allowed to.
>>>
>>> A followup patch to llvm will then stop implicitly producing comdats.
>>
>>
>> I'm not sure this is the right direction, but if it is, this change looks
>> reasonable to me.
>
>
> I think this patch is fine, what are your reservations?
>
>>
>>
>> Assuming we want to go in this direction, I think we should also consider
>> switching from a weak linkage to a strong linkage for entities governed by
>> the ODR; using weak linkage is a hack, since these symbols are really not
>> weak in the usual sense.
>
>
> Maybe but I think that can be considered separately from this change.
>
>>
>> That exposes a hole in our comdat support: we should have the ability to
>> define a discardable comdat (which need not be emitted if none of its
>> symbols are used in the current TU); a discardable comdat with an external
>> function definition seems like the right representation for an inline
>> function.
>
>
> I don't think that we want to abuse linkage like that.  We really need to
> sit down untangle this mess.  Before comdats, there was never a way to say
> "give me a global, stick it in a COMDAT but *don't* make it weak",
> COMDAT-ness and weak-ness came hand in hand.
>
> Now we have COMDATs and the world of linkage is unchanged, I don't think
> that this is a good place to be.
>
> We really need to kill linkage and replace it with several things, each of
> which individually control the behavior of the global.  Things like
> @llvm.used are a hack to work around the lack of flexibility linkage has to
> offer.
>
> I'm more than happy to help out with this if we can get consensus on a
> design.




More information about the cfe-commits mailing list