[LLVMdev] Analysis of polly-detect overhead in oggenc

Wed Jul 17 07:15:01 PDT 2013

On 07/16/2013 11:42 AM, Sebastian Pop wrote:
> Star Tan wrote:
>> I have found that the extremely expensive compile-time overhead comes from the string buffer operation for "INVALID" MACRO in the polly-detect pass.
>> Attached is a hack patch file that simply remove the string buffer operation. This patch file can significantly reduce compile-time overhead when compiling big source code. For example, for oggen*8.ll,  the compile time is reduced from 40.5261 ( 51.2%) to 5.8813s (15.9%) with this patch file.
>
> On top of your patch, I have removed from ScopDetection.cpp all printing of LLVM
> values, like this:
>
> -    INVALID(AffFunc, "Non affine access function: " << *AccessFunction);
> +    INVALID(AffFunc, "Non affine access function: ");
>
> there are a good dozen or so of these pretty printing.  With these changes the
> compile time spent in ScopDetection drops dramatically to almost 0: here is the
> longest running one in the compilation of an Android stack:
>
>     2.1900 ( 13.7%)   0.0100 (  7.7%)   2.2000 ( 13.6%)   2.2009 ( 13.4%)  Polly - Detect static control parts (SCoPs)
>
> Before these changes, the top most expensive ScopDetection time used to be a few
> hundred of seconds.

Hi Sebastian,

I am slightly confused. The patch of Star Tan did the following:

  #define INVALID(NAME, MESSAGE)                       \
    do {                                               \
-    std::string Buf;                                 \
-    raw_string_ostream fmt(Buf);                     \
-    fmt << MESSAGE;                                  \
-    fmt.flush();                                     \
-    LastFailure = Buf;                               \
      DEBUG(dbgs() << MESSAGE);                        \
      DEBUG(dbgs() << "\n");                           \
      assert(!Context.Verifying &&#NAME);

In my understanding, this patch alone removes all formatting overhead 
from the default execution. The only use of MESSAGE left is within the 
debug macro, which will only be evaluated if -debug is given.

I am surprised why you see further performance changes, by 
removing/changing the content of MESSAGE. As it is not evaluated, I do 
not see why this would change performance. Do you have any ideas what is 
going on.

I also tested for further performance differences on the oggenc 
benchmark and could not reproduce your results.

Cheers,
Tobias