[llvm] r188188 - Misc enhancements to LTO:

Tue Aug 13 21:37:30 PDT 2013

On 13 August 2013 18:26, Shuxin Yang <shuxin.llvm at gmail.com> wrote:

>
> On 8/13/13 5:32 PM, Nick Lewycky wrote:
>
> On 12 August 2013 16:41, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
>
>>  Thank you very much for sharing you concerns.  I read this mail
>> carefully, it seems we had little miscommunications.
>> I hope I clarify in this mail:-). See the interleaving text.
>>
>>
>> On 8/12/13 3:41 PM, Nick Lewycky wrote:
>>
>>
>>
>>>
>>>
>>>   >which in turn drives libLTO through the API.
>>>>
>>>>  Depending on the what kind of info "something" else need to drive the
>>>> libLTO.
>>>> In general it is very bad idea, if "something else" need
>>>> micro-management.
>>>>
>>>
>>>  libLTO is part of the linker that uses it.
>>>
>>>
>>>  No! Absolutely not!
>>>
>>
>>  Fair enough. I meant "libLTO is part of the linker that uses it" in the
>> same sense that a networking library is part of the web browser that uses
>> it. The library shouldn't be off deciding to do things of its own accord,
>> it should provide an API that allows something else to accomplish its task.
>>
>>  I don't think such comparison is precise.  For app with networking lib,
>> the "braid" reside at app side because the app define the behavior.
>> For linker+libLTO, I believe the "brain" should reside at libLTO side, as
>> it is far more complicate (although our current implement is bit simple).
>>
>
>  I'm afraid I don't agree. We can agree to disagree on this point, it
> isn't necessary for us to agree here to make forwards progress.
>
> Such discussion will be open-ended. I don't like to waste bandwidth over
> here. It is not my focus
> at all.  Maybe you can understand that point after you try to make some
> infrastructure level
> change to the lto. If you OSX, you will see that. If you don't, grab a
> Linux machine, try to implement
> that feature on Linux+gold but *WITHOUT* touching any bit the tool/gold/*.
>

Does
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20090202/073164.htmlcount?
tools/gold/* didn't exist when I started.

> From my point of view, the program that the user calls is ld. That ld is
> responsible for fulfilling its promises to the user, who does not know or
> care what libraries ld is using under the hood. We've taken the direction
> that we'll add new features to libLTO lazily, when there's a demand from a
> linker that wants them, but that shouldn't be confused for deliberately
> hiding functionality from the linker.
>
>  And it is confusing because there *are* things that libLTO is
> deliberately hiding: all the various changes in LLVM's C++ API.
>
>  I believe GNU gold is a good in designing the interface.  In the case of
>> gold+libLTO, the "brain" is embodied by "tool/gold/*.cpp."
>>
>
>  There's very little logic in tools/gold. It's either in llvm proper, or
> in gold proper. Both libLTO and LLVMgold are thin wrappers.
>
>   Anybody can change to whatever he/she like. Although it dose call APIs,
>> it dose not have to. I can directly call those c++ code
>> dancing behind the API.  This is the "stable API" I'm talking about.
>>
>
>  Okay. As a matter of terminology, I've been using "stable" to mean
> "unchanging", synonymous with ABI locked or ABI fixed.
>
>>    I don't see this as a very bad idea or as micro-management.
>>
>>
>>  I didn't see that either until I start implementing stuff.
>>
>>   In this regard, I don't see a difference between libLTO and other
>> libraries like libPNG or netlib or freetype. (There is a difference in that
>> we want libLTO to be a very high-level interface instead of exposing the
>> details of .bc files, entirely unlike what libPNG does for PNG or freetype
>> does for font files.)
>>
>>
>>  Similar in concept. Concept only.
>>
>>       Having a default setting with the ability to override it is a
>>> sensible convenience for users of libLTO.
>>>
>>>
>>>
>>>
>>>  Take Apple ld as example,  if I want to change LTO in a way such that
>>>> I don't want to load all module,
>>>> I just want to load summary info. Current APIs are not sufficient. I
>>>> have to modify the API, or add new APIs
>>>> to that matter, in the mean time, I need release the new ld to the user
>>>> in order to accomodate the change.
>>>> that is nightmare.
>>>>
>>>
>>>  The point of libLTO is to provide an ABI-fixed library, isolating the
>>> linker from llvm's internals.
>>>
>>>  It is not "fixed", it is changing constantly.
>>>
>>
>>  The only reason libLTO exists at all is to give the linker something to
>> link against which will have a fixed ABI. Same with "libclang" on the clang
>> side.
>>
>>  It is slightly different in the case of libLTO+linker.
>>
>> clang + libclang are tightly bound in terms of release. If you are not
>> happy with particular API, you can change to whatever you feel more
>> comfortable.
>>
>> The linker is usually release independently. The compiler has less
>> control, if the API between compiler and linker is constantly changing.
>> Keeping backward compatability is a big problem.
>>
>
>  Right. libLTO is supposed to be that solution, by giving the linker a
> stable ABI (and API) to link against, and handling whatever changes in the
> LLVM C++ API behind the scenes. Put another way, a linker built against
> libLTO version X should work with version Y for all Y>X forever, without
> even a recompile (assuming the linker was built to load libLTO dynamically).
>
> The problem is how do you magically define such APIs, once for all?
>

Look, I didn't make up this rule. I'm trying to tell you, as a new
contributor, that the rule already existed and you should be aware of it.

Here's an old example of Chris mentioning the rule that changes to the C
ABI are not allowed (and yes, libLTO's API is part of the C API):
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20090720/081858.html
.
I don't think this is the first time it was mentioned either, but I wasn't
able to find a better example with a minute of searching.

> Currently, we ignore the I/O issue about LTO, can you imagine what kind of
> APIs we need in future
> to improve the I/Os. And how can magically maintain such backward
> compatability? libLTO is
> potentially very complex state-machine.
>
> Why not let linker to expose its interface, and let plugin (LTO) to work
> with them. Isn't that easier.
> I don't understand you think exposing interface from LTO to linker is
> better. Such linker is tightly
> bound to a compiler, dedicated to compiler. In order to maintain that
> linker, engineer needs in-depth
> knowhows about compiler and linker. I don't see why it is better.
>

Let me try to restate what you said in my own words, so we can make sure I
understood it. You want llvm-lto to be its own program, and the system
linker should be a plugin that llvm-lto loads?

I have not experienced the problem you mention with needing to understand
the compiler and the linker. To take a recent example,
http://sourceware.org/ml/binutils/2013-06/msg00139.html was less than an
hour of work for me to determine that the bug was in gold. I certainly am
not really familiar with gold (or linkers in general), and I don't think
Cary knows much about llvm (though he probably knows a lot about compilers).

I have experienced a different kind of problem due to LTO: writing
testcases is a pain! With LTO, you can't just split a testcase into two
files to demonstrate the linking problem because LTO will see right through
that.

Is your concern theoretical, or has it actually come up?

>  Thus far it's LLVM policy that the *whole and entire* C API is ABI-fixed
>> forever, and I've argued a few times on the mailing list that this can't be
>> right, and that only libLTO and libClang ought to be ABI locked.
>>
>>   E.g. the APIs used to take for granted the libLTO return only one
>>> objects,
>>> now I need to return multiple.
>>>
>>
>>  Yes, and that's a problem. Not your problem really, except to the
>> degree that you inherited it. The existing APIs in libLTO weren't nearly
>> forwards-compatible enough, and now we're in trouble.
>>
>>  Unless we figure out something clever, we may have to add a whole new
>> set of functions to libLTO, and not deprecate the existing ones (at least,
>> not unless we get consensus on llvm-dev that it's okay to break our
>> previous ABI promise).
>>
>>>    That in turn leads to a few design decisions. The API is designed to
>>> refer to high-level concepts instead of the details of llvm's actual
>>> behaviour. Things like module lazy loading or setting the datalayout are
>>> excluded from the API. Flags are even more private, surely we should be
>>> able to change flags in LLVM's libraries without worrying about breaking
>>> linkers.
>>>
>>>  If the linker needs to do something where it matters how llvm is
>>> implemented -- you mention loading summary info, I'll assume you mean
>>> lazy-loading the module such that function bodies aren't loaded -- then the
>>> linker doesn't use libLTO at all, but uses llvm directly. Conversely,
>>> libLTO knows all about llvm and will lazy-load .bc files without being
>>> asked to.
>>>
>>>  Sure, "something else" can control the libLTO, if it want. In my case,
>>>> if "something else" want specify
>>>>  a workdir, then go ahead. Otherwise, the libLTO use default one. Is
>>>> there any wrong here?
>>>>
>>>
>>>  At a high level that sounds fine to me. The wrong part is using flags
>>> to do it.
>>>
>>>  then how to change the behavior for say, debugging purpose.
>>>
>>
>>  Debugging is special. In theory, you don't even need to commit to
>> upstream for debugging, but it's fine to add features that are helpful. We
>> have that sort of thin all over llvm. libLTO has addDebugOptions to permit
>> this sort of debugging usage, but it shouldn't be used in the non-debugging
>> case.
>>
>>
>>  Passing flags LTO is annoying and it is a sort of high-tech.
>> Bill's attribute-stuff is a way to pass some flags down the roads.
>>
>> How about passing -O3 -floop-vecotroize to make LTO and post-LTO code
>> works. (The opt-level is -O2, had vectorize-flags is off by default).
>>
>
>  Right, this is a great example. I would argue that we absolutely should
> not offer such control to libLTO, not by flags or environment variables or
> C API or anything. Why? Because it locks the entirety of LLVM to offering a
> "loop vectorizer" forever. What happens in LLVM 4.0 when we have a "global
> vectorizer" or "common vectorizer" or some other name? We've removed
> optimizations before (global common subexpression elimination, gvnpre, ...)
> and we should have the freedom to continue to do so.
>
>  Now, suppose along comes a linker vendor who says "as a feature, my
> users can specify -floop-vectorize or other flags to control which
> optimizations get run at link time". Do we refuse? Do we tell them it's
> okay, but we reserve the right to break these flags at any time (and then
> what, we can't catch typos because they could just be names of old optz'ns
> we don't support)? But I actually don't think giving control of this is a
> *feature* -- it can only really used as a workaround for bugs (or exotic
> stuff like kernel code where the vector register unit hasn't been
> initialized yet, but I am okay with having a flag to control whether we're
> allowed to use a certain class of instructions -- yet not okay with
> disabling individual optimizations).
>
>
> Those flags are passed as plugin flags. Ld has no idea of it.  On the
> other hand, such flags are passed by compiler (derived from
> -flto and other flags) . They are transparent to users.
>

Sure. It's a way to let the compiler and the compiler-provided LTO plugin
collaborate, via "flags" (really any string with limited punctuation).

But that assumes the linker will be run by the compiler, which isn't always
true. We want llvm lto to work as a drop-in replacement for the existing
tools, and sometimes existing build systems will run ld directly. (Yes,
gold will search known paths on the system to find the plugin, it doesn't
require a flag.)

>  We may (likely) have better way in the future to pass these flags to
>> LTO,
>> but we have to pass the these flags the it to make the existing code
>> work, at least for the time being.
>>
>>
>>          Adding flags to linker instead, I think that is wrong
>>>>> direction. Linker dose not have data structure which libLTO dose.
>>>>
>>>>
>>>>  This is the discussion to have. What things do you need here which
>>>> you don't think should be exposed through the API, and yet you want to be
>>>> exposed for you?
>>>>
>>>>  I actually discuss with Nick @ Apple before.  The conclusion is linker
>>>> must be LTO oblivious,
>>>> it should think in symbol-way, and talk in symbol way (as with GNU
>>>> gold). It would otherwise
>>>>  very very troublesome both for linker and libLTO.
>>>>
>>>
>>>  And now you're discussing it with me. I also agree that the linker
>>> should communicate primarily in symbols and about symbols with libLTO.
>>>
>>>  On the other hand, we now have two linkers support LTO. There are
>>>> different way to control
>>>> the libLTO (even for simple task, like save intermediate files), how
>>>> messy?
>>>>
>>>> I'd like to move all these stuff to libLTO to have a unified control.
>>>>
>>>
>>>  I have no problem with a unified control.
>>>
>>>>     libLTO is intended to be used as a library, it may not get a
>>>>> chance to parse flags.
>>>>>  It has to. Prior to my change, linkers (GNU linker and Apple ld) pass
>>>>> arch to linker, via a function
>>>>> confusingly called, something like "add.*debug.*options".
>>>>
>>>>
>>>>  Can't. If we allow this, every flag in every part of LLVM that libLTO
>>>> links against is baked into the C ABI forever.
>>>>
>>>>  Of course addDebugOptions does allow this, but it's named (and I
>>>> thought documented in the comments) such that anybody using it knows
>>>> they're using a non-stable non-production debugging API. Anybody using
>>>> addDebugOptions for something other than debugging libLTO is living outside
>>>> the ABI guarantees.
>>>>
>>>>  addDebugOptions is misnomer. It is also passes essential flags like
>>>> -arch=x86.  Without such flags,
>>>> the LTO dose not even compile.
>>>>
>>>
>>>  That sounds like a nice bug you've got there! Wouldn't want anything
>>> to happen to it. It'd be a shame if breaks before you manage to add a
>>> liblto_set_arch() function for it.
>>>
>>>
>>>   * Honestly, I looked and couldn't find a -arch flag that libLTO would
>>> interpret. How sure are you about this?
>>>
>>>  Perhaps not -arch flags.
>>> But at least some flags are passed this way.  I remember we use this way
>>> to pass -fast-math before Bill's attribute-stuff is working.
>>>
>>>
>>>   In case it isn't completely clear, flags are absolutely right out.
>>> Either you will revert this patch, or I will revert it for you.
>>>
>>>  I have no alternative.  If I introduce a workdir, I need to have to way
>>> to inform linker-plugin to get rid of way.
>>> This is another example why those API sucks.
>>>
>>
>>  You don't have the source code to the linker?
>>
>>  I can modify linker source code. The problem is how to make sure all
>> users get the modified linker to work with the new compilers.
>> It going to be very messy. right?
>>
>
>  True. You have a deployment problem where instead of shipping just a new
> libLTO, you ship a new libLTO (and all older linkers must continue to work
> with it), and then ship a new linker taking advantage of the new libLTO
> APIs. Sorry, but I think this is a natural consequence of the fact that
> libLTO needs to be ABI-locked.
>
>  (Also, in reality, if you can solve the deployment problem for libLTO,
> then you can solve the deployment problem for libLTO+ld. Yes it'll be more
> work.)
>
>  Unlike the clang and clanglib, they are so "close" in terms of release.
>> We can change at will.
>>
>>
>>  Let's focus on this, it sounds like this is the key problem. What's
>> wrong with modifying the linker if you want to change the behaviour of your
>> linker?
>>
>>  How often dose a user check if the linker is up-to-minute?
>>
>>
>>
>>       I'm sorry you decided to land three things together in one patch,
>>> please remember not to do that in the future.
>>>
>>>    Ok, tell me how to create temp workding directory right. How to save
>>> temp files right both for gold and Apple ld.
>>>
>>
>>  *Why*? Are you implementing this as a linker feature you intend to ship
>> in the real linker? Or is this to debug the innards of libLTO?
>>
>>
>>  It is not linker's feature, it is absolutely libLTO's own biz. Creating
>> a workdir is neat way to organize intermediate files,
>> we can certainly use a messy way to organize the intermediate files
>> without creating workdir.
>>
>>
>>  The only case I *am* okay with flags is when we all agree they're flags
>> for debugging the internals of libLTO,
>>
>>  To large extend, it is for trouble-shooting purpose.
>>
>>   and that we don't ship products that rely on them.
>>
>>  The product will not rely on it.
>>
>
>
>  Okay. Got it. So I have a few thoughts on this.
>
>  First of all, why don't we expose the fact that we produce native .o
> files? Because we don't want to necessarily require files.
> lto_codegen_compile returns a pointer to the memory containing the file in
> memory. Hilariously this means the libLTO writes out to disk, loads it into
> memory, hands it off to LLVMgold which writes it back down to disk again
> and hands it off to gold, which reads it back into memory. And yet it's
> still the right interface. libLTO should stop writing to disk and actually
> produce .o in RAM directly, and gold should learn to read from RAM
> directly.*
>
>    I remember gold call lto_codegen_compile_to_file while Apple call
> lto_codegen_compile.
> I don't want to discuss the details here as it deviate from what we are
> focusing.
>
>   Secondly, if we aren't exposing the fact that we produce native .o
> files, should we be exposing a knob that lets us control the working
> directory? Probably not, but it's not unreasonable. If we're going to write
> to disk it's polite to let the caller choose where. If we don't write to
> disk, the API can be trivially implemented by doing nothing.
>
> I don't want to expose workdir, it could have lots of junk over there. I
> don't like linker penetrate the privacy.
>

That's true, we don't want anyone to rely on the contents of the workdir. I
find it strange that you think it's perfectly fine to expose every flag in
llvm, but balk at exposing the workdir. I'd expect people to accidentally
bake in flags without realizing it could break on them in the future, but
not to accidentally plunder the workdir and rely on its contents without
realizing it could break some day.

>  You suggest $PWD/unique-tmp-workdir instead of /tmp. Consider $TMP, or
> on Windows $TEMP? I don't mind if we're smart enough to pick good defaults,
> but I can absolutely imagine a linker that to keep its temp files in a
> specific directory. Suppose a mobile OS that runs the linker on the phone,
> where they have strict disk quotas. It's important to put the files in the
> right places, so they get counted against the right quotas. Also, for
> cleanup in the event of a crash (assume it wipes the whole directory tree).
> I really think lto_set_tempdir would be a good API to have in libLTO, and
> poses no risk of being unimplementable in the future.
>
> This is changed now. 1st try PWD/unique-dir. It was not successful (e.g.
> no write-permission), call sys::fs::createUniqueDirectory()
> to create dir under $TMP or $TEMP.
>
> We try $PWD first, because often time, we debug over there. So it is bit
> convenient.
> There is no strong reason here.
>
>
>  Thirdly, I'm not convinced that lto_codegen_get_files_need_remove needs
> to exist. Why not do the file deletion in lto_codegen_dispose?
>
> You never know long the linker hold the intermediate file.
> On the hand, there was a bug in Apple ld, which never call
> llto_codegen_dispose.
>

What's the problem? The linker releases the intermediate files at the call
to lto_codegen_dispose. Do you want finer-grained control?

Nick

>  * Actually, Rafael added lto_codegen_compile_to_file in r128108 and
> we've had numerous LLVM release since then. Now we have to support writing
> to files forever; even if we support writing to memory, we can't remove the
> writing-to-file path.
>
> GNU gold dose not see lto_xxx. GNU-gold and lto_xxx are bridged by
> tool/gold/*.cpp.
>
> Whatever change I made to lto_xxx, dose not impact GNU-gold.
>
>
>   Fortunately we can continue to implement this API in the future by
> codegen'ing to memory plus a small amount of code to write to disk. Still,
> I'm a little bit sad inside.
>
>>    I explicitly called that out. If the only purpose of these was to
>> implement debugging features, then I'm sorry for the miscommunication!
>>
>>  If the intention is to let libLTO run on machines that don't have /tmp
>> (this is what I thought), we should give libLTO an API that lets the linker
>> decide where the files go.
>>
>>  I choose $PWD/unique-tmp-workdir instead of /tmp/xxx.
>> I should try /tmp/xxx if $PWD/unique-tmp-workdir dose not work.
>>
>>   Maybe it wants to do smart things like putting it in a directory with
>> the right permissions, or which is scheduled for cleanup in the event of a
>> crash.
>>
>>  That is in TODO list. I try to install the sig handler, but the
>> supporting routines ignore directory (it only delete regular file on
>> signal).
>>
>
>  Hold on, what if the linker installs its own signal handler? If
> lto_codegen_dispose/lto_module_dispose aren't safe to call during a signal
> handler, what do you think about providing a signal-safe-emergency-shutdown
> API to libLTO? It should only be as hard to implement as factoring out the
> code you were going to write anyways.
>
>
>   I have not yet closely check how the support/file-system stuff are
> implemented.
> Anyway I add them to list-of-file-need-to-removed to the cleanup hook.
> Hoping they will be called on signal.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130813/6eab82b8/attachment.html>