[cfe-dev] [LLVMdev] costing optimisations

Fri Nov 23 03:39:39 PST 2012

{Veering a bit off-topic}

On 23 Nov 2012, at 10:36, David Tweed wrote:

> Yep; the project part of my MSc was on dynamic programming problems similar to the TeX one. However, it doesn't actually first attempt to estimate how much work a paragraph is likely to take to break, it just uses a clever dynamic programming algorithm to build up a globally optimal set of breaks regardless of work cost. Only if the badness is still too high does it try to do extra breaks with hyphenations (and it was a long time ago I looked at this, but I think once it's decided to hyphenate a given word it stops caring about the whitespace badness of the break in favour of a "non-misleading" hyphenation -- such as avoiding weeknights becoming wee-knights). Anyway, this is becoming a digression.

Franklin Mark Liang's PhD thesis is quite readable and very informative for anyone actually interested in this.  The most interesting feature is that TeX uses a very good dynamic programming approach for laying out text in paragraphs but a uses an ugly greedy algorithm for laying out paragraphs on a page.  This decision was made because doing the same dynamic processing approach for every paragraph in the document would have required several megabytes of memory for large documents, which was completely unfeasible on the computers of the time.  It provides a good lesson that optimisations that made sense for one system are worth revisiting periodically.

David