[rfc][gold plugin] Fix pr19901

Rafael EspĂ­ndola rafael.espindola at gmail.com
Tue Aug 12 14:55:43 PDT 2014


> Okay. So at a high level, you've decided not to use libLTO any more. Funny
> aside, the first version of the gold plugin I put in tools/silver and it was
> written to use the llvm C++ API directly. I think it was Chris who told me
> that this was silly and I should be reusing the logic from libLTO to do it,
> so I started a second version in tools/gold.
>
> I'm very sympathetic to the idea of not fixing libLTO and letting it rot
> (oops, I meant to say letting Apple maintain it for ld64), though not doing
> that would make ultimately make libLTO stronger. I'm calling this out as a
> deliberate choice we have to make, we can either push libLTO to being well
> enough designed to handle linkers with very different plugin ideologies (and
> thus useful for others), or we can consign it to being an ld64 feature. If
> gold can't use libLTO because its interface is too limiting, it's likely lld
> won't either, that any third party linkers won't, etc. No I don't have any
> suggestions for how to generalize libLTO's API.
>

So, it is not so much that I have decided to use lib/LTO, it is that
gold has a fundamentally different view of how LTO works. Implementing
that I didn't find a lot of things to use in lib/LTO.

I am pretty convinced that gold's model has substantial advantages
over what is currently baked into the lib/LTO api, so I expect some
ideas to be use in the ld64 implementation. At that point it would be
nice to share code, but it is not clear what the best alternative
would be:

* Add a lib/LTO2
* Have two APIs in lib/LTO
* Move current lib/LTO back to tools/lto and have lib/LTO be the new thing.

At this point I don't have a clear view of what the final organization
will look like.

> +//===- Utils.h - LLVM LTO Utilities
> ---------------------------------------===//
>
> Missing emacs major mode marker.

Fixed.

> +void runLTOPasses(Module &M, TargetMachine &TM, bool DisableOpt,
> +                  bool DisableInline, bool DisableGVNLoadPRE);
>
> This should go away and become an interface to PassManagerBuilder instead.
> (PMB doesn't actually run the passes, but that's simple for the caller.)

I will see if I can move these (except the call to run) into
populateLTOPassManager?

> +  ErrorOr<object::IRObjectFile *> ObjOrErr =
> +      object::IRObjectFile::createIRObjectFile(std::move(buffer), Context);
> +  std::error_code EC = ObjOrErr.getError();
> +  if (EC == BitcodeError::InvalidBitcodeSignature)
>
> What about BitcodeError::InvalidBitcodeHeaderWrapper? Come to think of it,
> do we even have any tools in mainline that wrap bitcode?

Good catch. I will try to remove merge InvalidBitcodeWrapperHeader so
as to have a single error to check.

It looks like the regular bitcode writer does it:

$ cat test.ll
target triple = "x86_64-apple-darwin"
$ llvm-as test.ll -o test.bc
$ hexdump  -C test.bc | head -1
00000000  de c0 17 0b 00 00 00 00  14 00 00 00 30 01 00 00  |............0...|


> +  for (auto &Sym : Obj->symbols()) {
> +    uint32_t Symflags = Sym.getFlags();
> +    if (!(Symflags & object::BasicSymbolRef::SF_Global))
> +      continue;
>
> Why are we using the libobject interface for something that must be IR? This
> line:
>
> +    const GlobalValue *GV = Obj->getSymbolGV(Sym.getRawDataRefImpl());
>
> is horrible. Is this really what it looks like? Type unsafe laundering
> through void pointers or the equivalent? "Oh sure buddy, there's a
> GlobalValue* there, I promise."

We use libObject to be able to handle inline assembly. I agree that
libObject's iterator API is somewhat ugly.

> +        assert(!GV->hasExternalWeakLinkage() &&
> +               !GV->hasAvailableExternallyLinkage());
>
> Assert message? Also, hold on, couldn't this happen if we have extern_weak
> or available_externally in the .bc file? Or is this safe due to behind the
> scenes IRObjectFile magic?

These linkages are for declaration, since the symbol is defined, we
know it is not one of them. I added a "Not a declaration" message to
the assert.

> +    if (GV && (GV->hasWeakLinkage() || GV->hasLinkOnceLinkage()))
> +      sym.comdat_key = (char*)1;
>
> Let me get this straight. The old code called strdup(M->getSymbolName) to
> fill sym.name, then upon finding comdat did sym.comday_key = sym.name. The
> new code adds a "std::string NameBuffer" per claimed file, then we
> raw_ostream the names into that NameBuffer as we visit each symbol and you
> mark the ones that need a comdat_key here, then after this you have a loop
> that strlen's over NameBuffer to fill in sym.name (and sym.comdat_key if was
> set to 1 here).
>
> Why? Firstly this seems like a separable change. Secondly, did this use to
> leak, or did you introduce a double-free when gold tries to free it and then
> we delete NameBuffer? Third, would a BumpPtrAllocator<char> be better here?
> You get slabs that don't individually move, so you can set the pointer once
> instead of doing it in a loop after the fact.

It is somewhat unrelated change, yes. Since the final symbol name is
not present on the module (missing leading _ on darwin for example),
the IRObjectFile API lets it print the name to a stream and I guess I
pushed it a bit far. I will change the code to print to local buffer
and copy that. We can change the memory management independently.

> +static void keepGlobalValue(GlobalValue &GV) {
>
> I've been pontificating over what it means to have llvm.used name a function
> with a discardable linkage. The idea is that it would be one way to
> communicate to internalize that it should form weak linkage out of it. I'm
> not sure whether that's safe, it means that attribute used would run the
> risk of changing linkages.

Weak? No, that would be a miscompile. Consider two C files with static
function named foo that is __attribute__((used)). Making those weak
would mean that one is discarded.

I agree that llvm.used would be better expressed as a new dimension
for linkage or a new thing in GlobalValue, but that is what we have
for now.

> +  std::vector<GlobalValue*> Drop;
>
> Space before *.

Done.

> +        // Since we use the regular lib/Linker, we cannot just internalized
> +        // it now or it will not be copied to the merged module. Instead
> +        // we force it to be copied and then internalize it.
>
> Typo or grammar error starting at "internalized".

Fixed

> +      // We can only check for address uses once we merge the modules. The
>
> s/once/after/ . As written it causes a "garden path" parse.

Fixed

> +    // This is horrible. Given how lazy loading is implemneted, dropping

Fixed


> Typo, "implemneted" -> "implemented". Also, that really is a serious design
> flaw. Oof.

Fixed, and yes :-(
I am pretty sure we will have to fix this. Any call to
materializeAllPermanently has to go away from the common LTO code
path.

> +      message(LDPL_FATAL, "Failed to link moudle: %s", ErrMsg.c_str());
>
> Typo, "moudle" -> "module".

Fixed.

> +    if (canBeOmittedFromSymbolTable(GV))
> +       internalize(*GV);
>
> Three space indent? Please clang-format.

Done.

A patch with the easy bits is attached for reference. I will try to
fix the two bigger issues independently and rebase.

Thanks,
Rafael
-------------- next part --------------
A non-text attachment was scrubbed...
Name: t.patch
Type: text/x-patch
Size: 30048 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140812/123dbf97/attachment.bin>


More information about the llvm-commits mailing list