[LLVMdev] Analysis of polly-detect overhead in oggenc

Sun Jul 14 20:15:22 PDT 2013

On Sun, Jul 14, 2013 at 2:59 PM, Tobias Grosser <tobias at grosser.es> wrote:
> On 07/14/2013 08:05 AM, Star Tan wrote:
>>
>> I have found that the extremely expensive compile-time overhead comes from
>> the string buffer operation for "INVALID" MACRO in the polly-detect pass.
>> Attached is a hack patch file that simply remove the string buffer
>> operation. This patch file can significantly reduce compile-time overhead
>> when compiling big source code. For example, for oggen*8.ll,  the compile
>> time is reduced from 40.5261 ( 51.2%) to 5.8813s (15.9%) with this patch
>> file.
>
>
> Very nice analysis. I just tried it myself and can verify that for
> oggenc 16x, your patch reduces the scop-detection time from 90 seconds
> (80 %) to 0.5 seconds (2.5 %).
>
> I think there are two problems:
>
>   1) The cost of printing a single LLVM type/value increases with
>      the size of the overall Module. This seems to be because
>      TypeFinder::run() is called each time, without caching in place.
>      The cost of TypeFinder::run() increases with the size of the
>      module, as it basically just performs a scan on the entire Module.
>
>   2) We are formatting the failure messages during normal compilation,
>      even though they are only used in debugging tools like -view-scops
>
> In terms of solutions:
>
> It would be interesting to understand why is 1) so slow, especially as
> it seems to be either a fundamental problem in LLVM IR printing or the
> way we use the IR printing infrastructure.

I once analyzed it for GVN when I hit a similar issue (type finder
being rerun again and again while printing statements), and I came up
with the answer that the printing infrastructure does not expect to be
printing things regularly, and the typefinder only gets called so much
during debug printing, because of the call paths it goes through.

The basic cause is that Value::print creates AssemblyWriter objects
every time it is called. This in turn calls the type finder, for every
single call to Value::print.  The type finder is being called to build
up the list of named types in the module, so that when it prints
things, it can use the right name for the type instead of an anonymous
name.

if you do the following:

diff --git a/lib/VMCore/AsmWriter.cpp b/lib/VMCore/AsmWriter.cpp
index 7ef1131..8a2206b 100644
--- a/lib/VMCore/AsmWriter.cpp
+++ b/lib/VMCore/AsmWriter.cpp
@@ -2118,7 +2118,7 @@ void Value::print(raw_ostream &ROS, AssemblyAnnotationWrit
   if (const Instruction *I = dyn_cast<Instruction>(this)) {
     const Function *F = I->getParent() ? I->getParent()->getParent() : 0;
     SlotTracker SlotTable(F);
-    AssemblyWriter W(OS, SlotTable, getModuleFromVal(I), AAW);
+    AssemblyWriter W(OS, SlotTable, NULL, AAW);
     W.printInstruction(*I);
   } else if (const BasicBlock *BB = dyn_cast<BasicBlock>(this)) {
     SlotTracker SlotTable(BB->getParent());


(This is an old patch, AsmWriter.cpp is not in lib/VMCore anymore, but
you can see what is happening), you will not be shown named types in
the operand printing, but it will not be slow anymore.

This is of course, a complete hack just to work around the issue if
you need to get other work done.
The real fix is either to stop recreating these AssemblyWriter
objects, or improve caching in the bowels of it so that it doesn't
need to rerun typefinder again and again if nothing has changed.






> On the other side, for Polly
> we need to solve 2) anyway. Even if formatting would be faster, we
> should still not do it, if not needed. As we need to solve 2) anyway, 1)
> will only hit us when we do debugging/formatting. I assume in case of
> debugging the files we are looking into are normally smaller, such that
> the formatting overhead will not be that big.
>
> Hence, I would focus on 2). We could probably just put the code under a
> NDEBUG ifndef, but I would actually like to keep them available even in
> NDEBUG mode, as we may want to use the error messages to hint users to
> why their code can not be optimized. For this and also to get rid of
> another annoyance, the INVALID macro, I think we need to restructure the
> reporting of the last error, such that formatting of the error messages
> can be done on-demand. Another problem that could be solved at this
> point is to remove the macro use, which hides the fact that the
> functions return as soon as INVALID is called, which is plainly ugly.
>
> I am not sure how to structure this, but I could imagine some small
> class hierarchy that has a class for each error type. Each class just
> stores pointers to the data structures it needs to format its error
> message, but only formats the error on-demand. We could then return this
> class in case of failure and return a NoError class or a NULL pointer in
> case of success.
>
> This change may also help us to later add support to keep track of all
> errors we encounter (not just the first one). This is something Andreas
> and Johannes found helpful earlier.
>
> Cheers,
> Tobias
>
>
>
>
> g
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev