I wanted to update the llvm list on an offline discussion I was having with Rafael about a problem we have been seeing trying to compile an "interpreter" type of program with clang. It was producing a huge number of spills, something we had seen in llvm 2.8, and it seemed to have recurred in 3.0. Following Rafael's advice we added -disable-early-taildup to llc and the spills disappeared. It would be nice if we could pass such switches directly through the clang command line. Is there a way to do that?<br>

<br>thanks for your help Rafael!<br><br>/maurice<br><br><div class="gmail_quote">2012/2/21 Rafael Ávila de Espíndola <span dir="ltr"><<a href="mailto:rafael.espindola@gmail.com">rafael.espindola@gmail.com</a>></span><br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On 21/02/12 11:35 AM, Maurice Marks wrote:<br>

> Hi Rafael. I have an llvm question for you. I'm a developer on a project<br>

> that has been using llvm as a Jit for a dynamic binary translator. A<br>

> while back (llvm 2.8 days) we tried using llvm-gcc on an interpreter.<br>

> The structure of the interpreter is lots of shared state that is used in<br>

> code fragments joined by indirect jumps. The code produced by llvm (and<br>

> the compilation time) was just terrible - many register spills and<br>

> reloads. Compared to gcc it was more than an order of magnitude worse in<br>

> both compile and execution time. You fixed a bug in that area and things<br>

> have been much better since then.<br>

> Up until now. We just recompiled that code (with clang/llvm 3.0) and it<br>

> has many of the same problems as before - huge compile times,very poor<br>

> execution time. I noticed that someone else reported a similar problem<br>

> with interpreter-like code<br>

> (<a href="http://lists.cs.uiuc.edu/pipermail/llvmbugs/2011-August/019336.html" target="_blank">http://lists.cs.uiuc.edu/pipermail/llvmbugs/2011-August/019336.html</a><br>

</div>> <<a href="http://redir.aspx?C=ebddacddc1734e278cf102bef3e41fa0&URL=http%3a%2f%2flists.cs.uiuc.edu%2fpipermail%2fllvmbugs%2f2011-August%2f019336.html" target="_blank">http://redir.aspx?C=ebddacddc1734e278cf102bef3e41fa0&URL=http%3a%2f%2flists.cs.uiuc.edu%2fpipermail%2fllvmbugs%2f2011-August%2f019336.html</a>>).<br>


<div class="im">><br>

> I regard you as an expert in this area of optimization so I'd like to<br>

> understand how, in a program structure that has a complex, or<br>

> indeterminate cfg, should shared state be bound  to registers? Based on<br>

> the frequency of reference there is pressure to bind a lot of state to<br>

> registers, but that would result in overflowing the real registers very<br>

> quickly, resulting in spills and reloads. But keeping all the state in<br>

> memory is also non optimum. gcc seems to be able to find a sweet spot,<br>

> referencing shared state in memory, but in the basic blocks between<br>

> indirect jumps keeping frequently used state in registers.<br>

><br>

> What exactly was the algorithm fix you applied about a year ago? Could<br>

> it have become undone in some way?<br>

><br>

> As a ex-compiler developer I would expect you to just ask for a test<br>

> case, but in our particular situation, and the case of the other person<br>

> who reported a similar thing, the program has to be large and complex<br>

> before a bug appears. However the structural features are the same -<br>

> lots of frequently accessed shared state, indeterminate (due to indirect<br>

> jumps) control flow graph.<br>

><br>

> I'm interested in any comments or suggestions you have on this topic,<br>

> and I thank you sincerely for your many contributions to llvm.<br>

<br>

</div>Hi Maurice,<br>

<br>

At the time I was benchmarking firefox builds with clang. In most files<br>

clang did better, but on the interpreter the result was *really* bad.<br>

Investigating it found some issues<br>

<br>

* The register coalescer was failing to joint many copies. This was<br>

fixed by 134199, but it is possible there are more cases the coalescer<br>

is not handling.<br>

<br>

* Tail duplication was done way too late, preventing other optimizations<br>

from taking advantage of it. I moved it a bit earlier in 134372, but it<br>

should really be done at the IL level.<br>

<br>

Two things I would suggest trying<br>

<br>

* Disable tail duplication completely. If the big problem is the<br>

register allocation doing a bad job, this should make its life easier<br>

and help you find what improvements the register allocator needs.<br>

<br>

* Try the patch I posted for clang making it duplicate the indirectbr<br>

from the very start. This will make compile time *really* slow, but<br>

should show if doing early tail duplication would help in your case.<br>

<br>

<a href="http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20110606/121937.html" target="_blank">http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20110606/121937.html</a><br>

<br>

It would be really nice if you could post the result to the list :-)<br>

<br>

> regards<br>

> /maurice marks<br>

<br>

Cheers,<br>

<font color="#888888">Rafael<br>

</font></blockquote></div><br><br clear="all"><br>-- <br>Not sent from my Blackberry, Raspberry or Gooseberry!<br>