[cfe-dev] [LLVMdev] costing optimisations

Fri Nov 23 02:36:34 PST 2012

On 23/11/2012, at 8:28 PM, David Tweed wrote:

| So TeX actually considers several possible break points,
| and calculates a badness for each one.  but there's more.
| The choice of a break point in a line affects the next line.
| And the one after that, etc. TeX tries to solve for optimal layout
| of the whole paragraph. That's why it produces such awesome
| results.

Yep; the project part of my MSc was on dynamic programming problems similar to the TeX one. However, it doesn't actually first attempt to estimate how much work a paragraph is likely to take to break, it just uses a clever dynamic programming algorithm to build up a globally optimal set of breaks regardless of work cost. Only if the badness is still too high does it try to do extra breaks with hyphenations (and it was a long time ago I looked at this, but I think once it's decided to hyphenate a given word it stops caring about the whitespace badness of the break in favour of a "non-misleading" hyphenation -- such as avoiding weeknights becoming wee-knights). Anyway, this is becoming a digression.

> However, it's also potentially introducing highly
> non-predictable/principle-of-least-surprise behaviour into the compiler,
> particularly since llvm can be used as a JIT. So I suspect it wouldn't be
> something most people would want for mainstream clang/llvm, but if might be
> interesting if someone were to implement some passes like this (probably
> out-of-tree) and see if it proves beneficial.

My problem is that I get quantum non-determinism :)
Either the compiler finishes and does a "reasonable job"
or it takes so long the whole thing gets killed.

That really sounds like a bug (either software or hardware): whether it's fast or slow, the compilation time (for the same options) should be essentially the same for the same piece of code.

Regards,
Dave