[llvm] r189297 - Add new API lto_codegen_compile_parallel().

Tue Sep 3 09:07:50 PDT 2013

On Tue, Aug 27, 2013 at 10:42 AM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
> Revert in 189386.  Once again, I apologize  I don't follow the canonical
> procedure.
> I personally think Nick's proposal is clean enough for our system, and take
> for granted
> the community will like it.
>

It's not necessarily bad, but the lto library is a bit funky and
perhaps a new lto library is what we need :)

> I will not initiate a discussion for now. I'd like to cool things down for a
> while. (maybe postpone indefinitely).
>
> As with most infrastructure related project, partition is an unglamorous and
> pain-taking work.
> I step forward to take it just because we are almost have no way debug or
> investigate LTO.
>

Absolutely.

> For those who is curious about how much we can speedup by partition.
> Unfortunately, I can't tell
> as the project is not yet completely done. My rudimentary (quite stupid
> actually)
> implementation using make-utility speedup the command "clang++ Xalancbmk/*.o
> -flto"
> by 39%. (35s vs 21s, Xalancbmk has 700+ input).  It is bit shame for
> partition. But at very least, each partition
> is under human control.  On the other hand,  post-IPO scalar-optimization is
> not yet parallelizied
> in my rudimentary implementation. (i.e. so far only parallelize the codegen
> part). Surprisingly,
> the result is very consistent with what Xiaofei achieve via multh-threading
> code-gen.  As far
> as I can recall, he speedup some 2.9x. In my case, it take about 13s before
> code-gen starts.
> Meaning the speedup to the code-gen is about (35-13)/(21-13) = 2.75x.
> (Code-gen plus linker's post-processing take 35-13s).
>

Interesting. What's the difference (or your opinion) here between,
say, parallelizing codegen/post-ipo passes and splitting the module?
Why go for the second rather than the first?

-eric

>
> On 8/27/13 12:27 AM, Shuxin Yang wrote:
>
> On 8/26/13 11:19 PM, Chandler Carruth wrote:
>
> On Mon, Aug 26, 2013 at 5:53 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
>>
>> We certainly need a way to feed multiple resulting objects back to linker.
>> There are couple of ways
>> for this end:
>>
>>    1) 'ld -r all-resulting-obj-on-disk -o result.o"  and feed the only
>> object file (i.e. the result.o)
>>        back to linker
>>
>>     2) keep the resulting objects in memory buffer, and feedback to
>> buffers back to linker
>>         (as proposed by Nick)
>>
>>     3) As with GNU gold,  save the resulting objects on disk, and feed the
>> these disk files back to linker
>> one by one.
>>
>>     I'm big linker nut. I don't know which way work better.  I try to use
>> 1) as a workaround for the time being
>> before 2) is available. People at Apple disagree my engineering approach.
>>
>>     From compiler's perspective,
>>     o. 1) is not just workaround, 3) is certainly better than 1).
>>     o. 2) will win if the program being compiled is small- or
>> medium-sized.
>>         With huge programs,  it will be difficult for compiler to decide
>> when and how to "spill" some stuff
>>         from memory to disk.  Folks in Apple iterate and reiterate we only
>> consider the case that the entire
>>        program can be loaded in memory. So, the added difficulty for
>> compiler dose not seems to be a
>>        problem for the workload we care about.
>
>
> Shuxin, I'm not sure what you're trying to accomplish here, but I don't
> think this is the right approach.
>
> First, you seem to be pursuing a partitioning scheme for parallelizing LTO
> work despite *no* consensus that this is the correct approach
>
> I sent a proposal long time ago, as far as I can understand from the mailing
> list. There is no objection at all.
> Actually, but my approach is not new at all. It is almost a "std" way to
> perform partition. It looks similar to all LTOs I worked/played before.
> It just need some LLVM flavor.  But this change has nothing to do the
> partition implementation, it just add a interface.
>
> in any of the community discussions I can find. Please don't commit code
> toward a design that the community has expressed serious reservations about
> without review.
>
> Second, you are committing a new API to the set of the stable C APIs that
> libLTO exposes without a thorough discussion on the mailing list.
>
> Sorry, I thought this is pretty Apple thing, as no other system use this
> API.
> I will revert tomorrow, and initiate a discussion.
>
> The APIs are almost divided into two classes. One for Unix+gold, the other
> one for OSX + Apple LD.
> I don't like the way it is, and I don't like the such APIs at all (I mean
> all of them).
>  I used to argue we are better off having a symbol-related interface instead
> of LTO-related API.
>  But the community dose not buy my point.  As I have little knowledge about
> LLVM, I have to keep
> open mind, and adapter to LLVM-thinking, but it certainly take some time.
>
> It is possible I have missed this discussion, but I did look and failed to
> find anything that seems to resemble a review, much less an LGTM. If I have
> missed it, I apologize and please direct me at the thread. I bring this up
> because the specific interface seems surprising to me.
>
> Third, you are justifying the particular approach with a deflection to some
> discussion within Apple or with those developers you work with at Apple.
> While this may in fact be the motivation for this patch, the open source
> community is often not party to these discussions. ;]
>
> That is true:-)
>
> It would help us if you would just give the specific basis rather than
> referencing a discussion that we weren't involved with. As it happens, I
> suspect I agree with these "Folks in Apple" that it is useful to
> specifically optimize for the case that an entire program fits into memory,
> bypassing the filesystem.
>
> You bet!.
>
> I debate with them. No chance to win. Why don't you suspect in the first
> place:-).
> But "folks in Apple" argue that is plan in the future.  It dose not seems to
> be pretty lame argument,
> as current implement of LTO bring everything in memory.
>
> | However, there are many paths to that end result. From the little
> information in the commit log there isn't really enough to tell why *this*
> is the necessary path forward (in fact, I'm somewhat confident it isn't).
>
> In concept, there is only one alternative : compile the the merged module
> into multiple objects, and feed the object back to linker.
>
>
>
>
> So, to get back to Eric's original question: what is the motivation for this
> API, it's expected actual usage, and the reason why it is important to stub
> out in this way now?
>
> The motivation is: the existing LTO compile the merged module into *single*
> object,
>   with this new API, it enable the way to compile merged module into
> *multiple* objects.
>   I'm wondering if this is clear now.
>
>    for instance, suppose the command line is "clang -flto a.o b.bc c.o d.bc"
> (*.o is real object, and *.bc are bitcode),
>   existing LTO will merge b.bc and d.dc into t.bc (merged module), LTO will
> compile the merged t.bc into t.o,
> and feed the t.o back the linker which combine a.o c.o t.o into a.out.
>
>    The new API will trigger the compiler convert t.o into p1.o and p2.o
> ...., and feed these p*.o back to linker, which
>   combine a.o and c.o into a.out.
>
>
>
>
>
>
>
> Better yet, could we have that discussion before growing the set of stable
> APIs that we claim to never regress?
>
>
> Sure. Sorry about that. I actually don't what to touch the lto_xxx() API for
> now.  I just want to do some workaround
> on the limitation on the linker, and wait for new ld. But Bob didn't buy my
> argument:-).
>
>
>