[LLVMdev] Analysis of polly-detect overhead in oggenc
Sebastian Pop
sebpop at gmail.com
Sat Jul 13 22:21:44 PDT 2013
Hi,
I think this should also go on the llvm-dev mailing list...
On Sat, Jul 13, 2013 at 11:18 PM, Star Tan <tanmx_star at yeah.net> wrote:
>
> At 2013-07-14 02:30:07,"Tobias Grosser" <tobias at grosser.es> wrote:
>>On 07/13/2013 10:13 AM, Star Tan wrote:
>>> Hi Tobias,
>>
>>Hi Star,
>>
>>thanks for the update. I copied the polly mailing list, as I believe
>>this discussion is very valuable.
>>
>>> I tried to dig into oggenc.ll, one of the two cases we had for
>>> scop-detect compile-time overhead, but I may need your help.
>>>
>>> First, we cannot simply reduce code size by changing part of function
>>> defines to declares. Experimental results shows that the relative
>>> scop-detect time always goes down as we reduce the code size. That is
>>> similar to what you have pointed out in your previous mail.
>>>
>>> For example, if we divide the oggen.ll into oggen-top-half.ll (replace
>>> all function defines into declares in the second half) and
>>> oggen-bottom-half.ll (replace all function defines into declares in the
>>> first half) , then their compile-time would be:
>>>
>>> //Data explain: total time is 4.428s, the absolute time of Polly-detect
>>> is 0.8780s and its relative compile time percentage is 20.1%.
>>> oggenc.ll: Total(4.428s), Polly-Detect(0.8780s, 20.1%)
>>> oggenc-top-half.ll: Total(2.680s), Polly-Detect(0.3828s, 14.2%)
>>> oggenc-bottom-half.ll: Total(1.552s), Polly-Detect(0.2608s, 16.9%)
>>>
>>> To investigate the relationship between the relative scop-detect time and
>>> source code size, I also evaluated some big files based on oggenc.ll.
>>> For example, we can generate a twice bigger file oggenc2.ll with the
>>> following command:
>>> cat oggenc.ll oggenc.ll > oggenc2.ll //write two copies
>>> of oggen.ll into oggen2.ll
>>> '<,'> s/define \(.*\)@/define \1 at t2_ //replace all defines
>>> in the second half to avoid redefinition
>>> '<,'> s/declare \(.*\)@/declare \1 at t2_ //replace all declares
>>> in the second half to avoid redefinition
>>> Similarly, we can generate oggenc4.ll (four copies of oggenc.ll) and
>>> oggen8.ll (eight copies of oggenc.ll). Their compile-time would be:
>>>
>>> oggenc: Total( 4.428s), Polly-Detect(0.8780s, 20.1%)
>>> oggenc*2: Total(10.204s), Polly-Detect(2.8121s, 27.8%)
>>> oggenc*4: Total(24.748s), Polly-Detect(9.9507s, 40.4%)
>>> oggenc*8: Total(69.808s), Polly-Detect(39.4136, 56.6%)
>>>
>>> Results show that the percentage of scop-detect compile time is
>>> significantly increased as the code size is getting bigger. This can also
>>> explain why our two cases oggenc.ll (5.9M) and tramp3d-v4.ll(19M) are both
>>> big file.
>>
>>Wow, this is an extremely interesting finding.
>>
>>> Second, I still have not found the reason why the relative scop-detect
>>> compile time percentage is significantly increased as the compiled code size
>>> is getting bigger.
>>>
>>> My initial idea is that the global variable "FunctionSet
>>> InvalidFunctions" (declared in include/polly/ScopDetection.h) may increase
>>> compile-time because it always contains all functions. Since the variable
>>> InvalidFunctions never releases its resource, the InvalidFunctions.count()
>>> operation in ScopDetection.cpp would increases when more function pointers
>>> are put into this Set container. Based on this idea, I attached a patch file
>>> to fix this problem. Unfortunately, results show that the compile-time of
>>> Polly-detect pass has no change with this patch file.
>>>
>>> I doubt that the key should be some scop-detect operations, for which the
>>> compile-time would increase as the code size or the number of functions
>>> increase, but I have not found such operations. Do you have any suggestions?
>>
>>Before we write a patch, we should do some profiling to understand where
>>the overhead comes from. I propose you generate oggenc*16 or even
>>oggen*32 to ensure we get to about 90% Polly-Detect overhead.
>>
>>I would then run Polly under linux 'perf'. Using 'perf record polly-opt
>>...' and then 'perf report'. If we are lucky, this points us exactly to
>>the function we spend all the time in.
>>
>>Cheers,
>>Tobias
>
> Thanks for your very useful suggestion.
> I have profiled the oggenc*16 and oggenc*32 and the results are listed as
> follows:
>
> oggenc*16: polly-detect compile-time percentage is 71.3%. The top five
> functions reported by perf are:
> 48.97% opt opt [.]
> llvm::TypeFinder::run(llvm::Module const&, bool)
> 7.43% opt opt [.]
> llvm::TypeFinder::incorporateType(llvm::Type*)
> 7.36% opt opt [.]
> llvm::TypeFinder::incorporateValue(llvm::Value const*)
> 4.04% opt libc-2.17.so [.] 0x0000000000138bea
> 2.06% opt [kernel.kallsyms] [k] 0xffffffff81043e6a
>
> oggenc*32: polly-detect compile-time percentage is 82.9%. The top five
> functions reported by perf are:
> 57.44% opt opt [.]
> llvm::TypeFinder::run(llvm::Module const&, bool)
> 11.51% opt opt [.]
> llvm::TypeFinder::incorporateType(llvm::Type*)
> 7.54% opt opt [.]
> llvm::TypeFinder::incorporateValue(llvm::Value const*)
> 2.66% opt libc-2.17.so [.] 0x0000000000138c02
> 2.26% opt opt [.]
> llvm::SlotTracker::processModule()
>
> It is surprise that all compile-time for TypeFinder is added into the
> compile-time for Polly-detect, but I cannot find the any call instructions
> to TypeFinder in Polly-detect.
>
> Do you have further suggestion?
>
You can try to set a breakpoint with gdb on llvm::TypeFinder::run
and then see the backtrace to figure out where this is called from Polly.
Thanks,
Sebastian
> Best wishes!
> Star Tan
>
> --
> You received this message because you are subscribed to the Google Groups
> "Polly Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to polly-dev+unsubscribe at googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
More information about the llvm-dev
mailing list