[LLVMdev] Analysis of polly-detect overhead in oggenc

Sun Jul 14 14:59:26 PDT 2013

On 07/14/2013 08:05 AM, Star Tan wrote:
> I have found that the extremely expensive compile-time overhead comes from the string buffer operation for "INVALID" MACRO in the polly-detect pass.
> Attached is a hack patch file that simply remove the string buffer operation. This patch file can significantly reduce compile-time overhead when compiling big source code. For example, for oggen*8.ll,  the compile time is reduced from 40.5261 ( 51.2%) to 5.8813s (15.9%) with this patch file.

Very nice analysis. I just tried it myself and can verify that for
oggenc 16x, your patch reduces the scop-detection time from 90 seconds
(80 %) to 0.5 seconds (2.5 %).

I think there are two problems:

   1) The cost of printing a single LLVM type/value increases with
      the size of the overall Module. This seems to be because
      TypeFinder::run() is called each time, without caching in place.
      The cost of TypeFinder::run() increases with the size of the
      module, as it basically just performs a scan on the entire Module.

   2) We are formatting the failure messages during normal compilation,
      even though they are only used in debugging tools like -view-scops

In terms of solutions:

It would be interesting to understand why is 1) so slow, especially as
it seems to be either a fundamental problem in LLVM IR printing or the
way we use the IR printing infrastructure. On the other side, for Polly
we need to solve 2) anyway. Even if formatting would be faster, we
should still not do it, if not needed. As we need to solve 2) anyway, 1)
will only hit us when we do debugging/formatting. I assume in case of
debugging the files we are looking into are normally smaller, such that
the formatting overhead will not be that big.

Hence, I would focus on 2). We could probably just put the code under a
NDEBUG ifndef, but I would actually like to keep them available even in
NDEBUG mode, as we may want to use the error messages to hint users to
why their code can not be optimized. For this and also to get rid of
another annoyance, the INVALID macro, I think we need to restructure the
reporting of the last error, such that formatting of the error messages
can be done on-demand. Another problem that could be solved at this
point is to remove the macro use, which hides the fact that the
functions return as soon as INVALID is called, which is plainly ugly.

I am not sure how to structure this, but I could imagine some small
class hierarchy that has a class for each error type. Each class just
stores pointers to the data structures it needs to format its error
message, but only formats the error on-demand. We could then return this
class in case of failure and return a NoError class or a NULL pointer in
case of success.

This change may also help us to later add support to keep track of all
errors we encounter (not just the first one). This is something Andreas
and Johannes found helpful earlier.

Cheers,
Tobias

g