[llvm] r189297 - Add new API lto_codegen_compile_parallel().
Wan, Xiaofei
xiaofei.wan at intel.com
Fri Sep 20 19:30:34 PDT 2013
Steve:
One more thing I need clarify, this patch is not only for Android, but for LLVM itself; just because this patch happen to improve Android use case. This patch passed lots of test suites and Android/toolchain is just a small part of them. It is not a proper place to discuss Android project here, for sure technical inputs are always welcome anytime & anywhere.
After rounds of discussions, community will come out a best solution finally to improve code generation speed (eg. Shuxin's proposal also sounds very good theoretically).
Thanks
Wan Xiaofei
-----Original Message-----
From: Eric Christopher [mailto:echristo at gmail.com]
Sent: Saturday, September 21, 2013 12:52 AM
To: Wan, Xiaofei
Cc: Stephen Hines; Shuxin Yang; Chandler Carruth; llvm-commits at cs.uiuc.edu
Subject: Re: [llvm] r189297 - Add new API lto_codegen_compile_parallel().
FWIW I am (and have mentioned a few times) in favor of the parallel passes approach and it's great to have some data from people that have tried that approach.
Thanks Xiaofei!
-eric
On Thu, Sep 19, 2013 at 6:38 PM, Wan, Xiaofei <xiaofei.wan at intel.com> wrote:
> Steve:
>
>
>
> Sorry for introducing your misunderstanding here; I am not convincing
> community to accept this patch. As what you said, this is just an
> experimental project; it is not enough to be only verified in small
> test coverage.
>
>
>
> This is just for discussion; to prove the possibility of passes
> parallelism since Shuxin propose another solution. I think the guys in
> the community will work out a most proper solution to improve the code
> generation, so I don't care which patch will be upstream.
>
>
>
> Thanks
> Wan Xiaofei
>
>
>
> From: Stephen Hines [mailto:srhines at google.com]
> Sent: Friday, September 20, 2013 8:47 AM
> To: Wan, Xiaofei
> Cc: Eric Christopher; Shuxin Yang; Chandler Carruth;
> llvm-commits at cs.uiuc.edu
>
>
> Subject: Re: [llvm] r189297 - Add new API lto_codegen_compile_parallel().
>
>
>
> Although this was merged into an AOSP project, I want to make it clear
> that this is *NOT* the official LLVM toolchain for Android (and thus
> does not constitute endorsement of this patch). That repository is an
> experimental branch for a 20% project at Google that wanted to try out
> the patch. Please do not use unofficial sources to try to convince the
> LLVM that your patch has been accepted/verified by Android.
>
>
>
> We will continue to only accept upstream patches for rebasing our
> Android LLVM sources. When this patch or something different gets
> accepted as the proper way to improve code generation performance, we
> will be using the same patch as upstream.
>
>
>
> Thanks,
>
> Steve
>
>
>
> On Sun, Sep 15, 2013 at 12:30 AM, Wan, Xiaofei <xiaofei.wan at intel.com>
> wrote:
>
> Interesting. What's the difference (or your opinion) here between,
> say, parallelizing codegen/post-ipo passes and splitting the module?
> Why go for the second rather than the first?
>
> [Xiaofei] The first one is just what I have proposed, almost at the
> same time as Shuxin proposed his idea; we have merged it into
> AOSP/llvm-toolchain project; it could improve back-end code-gen by
> 3.5X for 4 threads on our device.
> We did what Shuxin proposed and found the module partition is not a
> good solution since "module partition, binary merge" will take pretty
> time; we abandoned module partition and turn to function-based
> parallelism (parallelize passed)
>
> LLVM back-end compilation time is important to our business(we only
> care the compilation time without LTO); I am looking forward that
> community could come to agreement on the final solution to parallelize
> the back-end codegen passes, any solution is OK; meanwhile I will keep
> my proposal open here, we do hope community could come to a good solution.
>
> Here I attach the discussion before and the code we have merged into
> AOSP/toolchain.
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-July/063796.html
> http://llvm-reviews.chandlerc.com/D1152
> https://android-review.googlesource.com/#/c/62308/
>
> Thanks
> Wan Xiaofei
>
>
>
> -----Original Message-----
> From: llvm-commits-bounces at cs.uiuc.edu
> [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Eric
> Christopher
> Sent: Wednesday, September 04, 2013 12:08 AM
> To: Shuxin Yang
> Cc: llvm-commits at cs.uiuc.edu
> Subject: Re: [llvm] r189297 - Add new API lto_codegen_compile_parallel().
>
> On Tue, Aug 27, 2013 at 10:42 AM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
>> Revert in 189386. Once again, I apologize I don't follow the
>> canonical procedure.
>> I personally think Nick's proposal is clean enough for our system,
>> and take for granted the community will like it.
>>
>
> It's not necessarily bad, but the lto library is a bit funky and
> perhaps a new lto library is what we need :)
>
>> I will not initiate a discussion for now. I'd like to cool things
>> down for a while. (maybe postpone indefinitely).
>>
>> As with most infrastructure related project, partition is an
>> unglamorous and pain-taking work.
>> I step forward to take it just because we are almost have no way
>> debug or investigate LTO.
>>
>
> Absolutely.
>
>> For those who is curious about how much we can speedup by partition.
>> Unfortunately, I can't tell
>> as the project is not yet completely done. My rudimentary (quite
>> stupid
>> actually)
>> implementation using make-utility speedup the command "clang++
>> Xalancbmk/*.o -flto"
>> by 39%. (35s vs 21s, Xalancbmk has 700+ input). It is bit shame for
>> partition. But at very least, each partition is under human control.
>> On the other hand, post-IPO scalar-optimization is not yet
>> parallelizied in my rudimentary implementation. (i.e. so far only
>> parallelize the codegen part). Surprisingly, the result is very
>> consistent with what Xiaofei achieve via multh-threading code-gen.
>> As far as I can recall, he speedup some 2.9x. In my case, it take
>> about 13s before code-gen starts.
>> Meaning the speedup to the code-gen is about (35-13)/(21-13) = 2.75x.
>> (Code-gen plus linker's post-processing take 35-13s).
>>
>
> Interesting. What's the difference (or your opinion) here between,
> say, parallelizing codegen/post-ipo passes and splitting the module?
> Why go for the second rather than the first?
>
> -eric
>
>>
>> On 8/27/13 12:27 AM, Shuxin Yang wrote:
>>
>> On 8/26/13 11:19 PM, Chandler Carruth wrote:
>>
>> On Mon, Aug 26, 2013 at 5:53 PM, Shuxin Yang <shuxin.llvm at gmail.com>
>> wrote:
>>>
>>> We certainly need a way to feed multiple resulting objects back to
>>> linker.
>>> There are couple of ways
>>> for this end:
>>>
>>> 1) 'ld -r all-resulting-obj-on-disk -o result.o" and feed the
>>> only object file (i.e. the result.o)
>>> back to linker
>>>
>>> 2) keep the resulting objects in memory buffer, and feedback to
>>> buffers back to linker
>>> (as proposed by Nick)
>>>
>>> 3) As with GNU gold, save the resulting objects on disk, and
>>> feed the these disk files back to linker one by one.
>>>
>>> I'm big linker nut. I don't know which way work better. I try
>>> to use
>>> 1) as a workaround for the time being before 2) is available. People
>>> at Apple disagree my engineering approach.
>>>
>>> From compiler's perspective,
>>> o. 1) is not just workaround, 3) is certainly better than 1).
>>> o. 2) will win if the program being compiled is small- or
>>> medium-sized.
>>> With huge programs, it will be difficult for compiler to
>>> decide when and how to "spill" some stuff
>>> from memory to disk. Folks in Apple iterate and reiterate
>>> we only consider the case that the entire
>>> program can be loaded in memory. So, the added difficulty for
>>> compiler dose not seems to be a
>>> problem for the workload we care about.
>>
>>
>> Shuxin, I'm not sure what you're trying to accomplish here, but I
>> don't think this is the right approach.
>>
>> First, you seem to be pursuing a partitioning scheme for
>> parallelizing LTO work despite *no* consensus that this is the
>> correct approach
>>
>> I sent a proposal long time ago, as far as I can understand from the
>> mailing list. There is no objection at all.
>> Actually, but my approach is not new at all. It is almost a "std" way
>> to perform partition. It looks similar to all LTOs I worked/played before.
>> It just need some LLVM flavor. But this change has nothing to do the
>> partition implementation, it just add a interface.
>>
>> in any of the community discussions I can find. Please don't commit
>> code toward a design that the community has expressed serious
>> reservations about without review.
>>
>> Second, you are committing a new API to the set of the stable C APIs
>> that libLTO exposes without a thorough discussion on the mailing list.
>>
>> Sorry, I thought this is pretty Apple thing, as no other system use
>> this API.
>> I will revert tomorrow, and initiate a discussion.
>>
>> The APIs are almost divided into two classes. One for Unix+gold, the
>> other one for OSX + Apple LD.
>> I don't like the way it is, and I don't like the such APIs at all (I
>> mean all of them).
>> I used to argue we are better off having a symbol-related interface
>> instead of LTO-related API.
>> But the community dose not buy my point. As I have little knowledge
>> about LLVM, I have to keep open mind, and adapter to LLVM-thinking,
>> but it certainly take some time.
>>
>> It is possible I have missed this discussion, but I did look and
>> failed to find anything that seems to resemble a review, much less an
>> LGTM. If I have missed it, I apologize and please direct me at the
>> thread. I bring this up because the specific interface seems
>> surprising to me.
>>
>> Third, you are justifying the particular approach with a deflection
>> to some discussion within Apple or with those developers you work
>> with at Apple.
>> While this may in fact be the motivation for this patch, the open
>> source community is often not party to these discussions. ;]
>>
>> That is true:-)
>>
>> It would help us if you would just give the specific basis rather
>> than referencing a discussion that we weren't involved with. As it
>> happens, I suspect I agree with these "Folks in Apple" that it is
>> useful to specifically optimize for the case that an entire program
>> fits into memory, bypassing the filesystem.
>>
>> You bet!.
>>
>> I debate with them. No chance to win. Why don't you suspect in the
>> first place:-).
>> But "folks in Apple" argue that is plan in the future. It dose not
>> seems to be pretty lame argument, as current implement of LTO bring
>> everything in memory.
>>
>> | However, there are many paths to that end result. From the little
>> information in the commit log there isn't really enough to tell why
>> *this* is the necessary path forward (in fact, I'm somewhat confident
>> it isn't).
>>
>> In concept, there is only one alternative : compile the the merged
>> module into multiple objects, and feed the object back to linker.
>>
>>
>>
>>
>> So, to get back to Eric's original question: what is the motivation
>> for this API, it's expected actual usage, and the reason why it is
>> important to stub out in this way now?
>>
>> The motivation is: the existing LTO compile the merged module into
>> *single* object,
>> with this new API, it enable the way to compile merged module into
>> *multiple* objects.
>> I'm wondering if this is clear now.
>>
>> for instance, suppose the command line is "clang -flto a.o b.bc
>> c.o d.bc"
>> (*.o is real object, and *.bc are bitcode),
>> existing LTO will merge b.bc and d.dc into t.bc (merged module),
>> LTO will compile the merged t.bc into t.o, and feed the t.o back the
>> linker which combine a.o c.o t.o into a.out.
>>
>> The new API will trigger the compiler convert t.o into p1.o and
>> p2.o ...., and feed these p*.o back to linker, which
>> combine a.o and c.o into a.out.
>>
>>
>>
>>
>>
>>
>>
>> Better yet, could we have that discussion before growing the set of
>> stable APIs that we claim to never regress?
>>
>>
>> Sure. Sorry about that. I actually don't what to touch the lto_xxx()
>> API for now. I just want to do some workaround on the limitation on
>> the linker, and wait for new ld. But Bob didn't buy my argument:-).
>>
>>
>>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
More information about the llvm-commits
mailing list