[LLVMdev] Analysis of polly-detect overhead in oggenc

Sun Jul 21 09:49:47 PDT 2013

Hi all,

I have attached a patch file to reduce the polly-detect overhead.

My idea is to avoid calling TypeFinder in Non-DEBUG mode, so TypeFinder is only called in DEBUG mode with the DEBUG macro. 
This patch file did this work with following modifications:

First, it keeps most of string information by replacing  "<<" with "+" operation. For example, code like this:
    INVALID(CFG, "Non branch instruction terminates BB: " + BB.getName());
would be converted into:
   LastFailure = "Non branch instruction terminates BB: " + BB.getName().str();

Second, it simplifies some complex operations like:
       INVALID(AffFunc,
              "Non affine branch in BB '" << BB.getName() << "' with LHS: "
                                          << *LHS << " and RHS: " << *RHS);
into:
      LastFailure = "Non affine branch in BB: " + BB.getName().str();

In such cases, some information for "LHS" and "RHS" are missed. However, it only has little affects on the variable "LastFailure", while keeping all information for DEBUG information. Since the variable "LastFailure" is only used in ScopGraphPrinter, which should only show critical information in graph, I hope such modification is acceptable. 

Results show that it has almost the same performance improvements as my previous hack patch file, i.e., reducing the compile-time of Polly-detect pass from 90s (>80%) to 0.5s (2.5%) when compiling oggenc 16X.

Best wishes,
Star Tan

Postscript to  Tobias:
 I have also implemented your initial proposal, which uses some class hierarchy to show different Failure information. Unfortunately, I found it would significantly complicate the source code. It not only introduces a lot of dynamic object "new" and "copy" operations, but also makes source code hard to understand. I argue that very detailed information for the variable "LastFailure" is not essential because it should only show critical information in the graph. If users want to know detailed failure information, they should refer to DEBUG information. This new patch file keeps most of critical information for "LastFailure" except some detailed information about Instruction and SCEV.
Do you have any suggestion?

At 2013-07-15 05:59:26,"Tobias Grosser" <tobias at grosser.es> wrote:
>On 07/14/2013 08:05 AM, Star Tan wrote:
>> I have found that the extremely expensive compile-time overhead comes from the string buffer operation for "INVALID" MACRO in the polly-detect pass.
>> Attached is a hack patch file that simply remove the string buffer operation. This patch file can significantly reduce compile-time overhead when compiling big source code. For example, for oggen*8.ll,  the compile time is reduced from 40.5261 ( 51.2%) to 5.8813s (15.9%) with this patch file.
>
>Very nice analysis. I just tried it myself and can verify that for
>oggenc 16x, your patch reduces the scop-detection time from 90 seconds
>(80 %) to 0.5 seconds (2.5 %).
>
>I think there are two problems:
>
>   1) The cost of printing a single LLVM type/value increases with
>      the size of the overall Module. This seems to be because
>      TypeFinder::run() is called each time, without caching in place.
>      The cost of TypeFinder::run() increases with the size of the
>      module, as it basically just performs a scan on the entire Module.
>
>   2) We are formatting the failure messages during normal compilation,
>      even though they are only used in debugging tools like -view-scops
>
>In terms of solutions:
>
>It would be interesting to understand why is 1) so slow, especially as
>it seems to be either a fundamental problem in LLVM IR printing or the
>way we use the IR printing infrastructure. On the other side, for Polly
>we need to solve 2) anyway. Even if formatting would be faster, we
>should still not do it, if not needed. As we need to solve 2) anyway, 1)
>will only hit us when we do debugging/formatting. I assume in case of
>debugging the files we are looking into are normally smaller, such that
>the formatting overhead will not be that big.
>
>Hence, I would focus on 2). We could probably just put the code under a
>NDEBUG ifndef, but I would actually like to keep them available even in
>NDEBUG mode, as we may want to use the error messages to hint users to
>why their code can not be optimized. For this and also to get rid of
>another annoyance, the INVALID macro, I think we need to restructure the
>reporting of the last error, such that formatting of the error messages
>can be done on-demand. Another problem that could be solved at this
>point is to remove the macro use, which hides the fact that the
>functions return as soon as INVALID is called, which is plainly ugly.
>
>I am not sure how to structure this, but I could imagine some small
>class hierarchy that has a class for each error type. Each class just
>stores pointers to the data structures it needs to format its error
>message, but only formats the error on-demand. We could then return this
>class in case of failure and return a NoError class or a NULL pointer in
>case of success.
>
>This change may also help us to later add support to keep track of all
>errors we encounter (not just the first one). This is something Andreas
>and Johannes found helpful earlier.
>
>Cheers,
>Tobias
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130722/bb63f382/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-ScopDetect-Optimize-compile-time-cost-for-string-ope.patch
Type: application/octet-stream
Size: 16775 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130722/bb63f382/attachment.obj>