[llvm-dev] [Q] What can drive compiler performance improvements in the future?

Wed Feb 24 14:31:37 PST 2021

Hello and thank you Mircea, Stefanos, and Michael for your great thoughts
and useful pointers.
-Denis

On Wed, 24 Feb 2021 at 05:15, Stefanos Baziotis <stefanos.baziotis at gmail.com>
wrote:

> Hi everyone,
>
> 1) is already doing autotuning (it's a hybrid between a static cost model
> and an actual running in the target, you can see more here [1])
> But what I tried to convey, at least from my perspective and that of the
> one of the authors (Alex Aiken) is that it is an example
> of a bigger idea. Specifically, that we don't even code the transformation
> in the classic sense; we make "dumb"
> optimizers which incidentally are also unconstrained.
>
> Parallelization in 2) is based on different ideas than a loop
> transformation framework where we try multiple transformations.
> The core idea is to remove dependence edges based on speculation to enable
> parallelization. To put it differently,
> it is dependence-centric, where we don't have many transformations to do
> (actually, pretty much just one: parallelization)
> as opposed to transformation-centric, where we care about what is the best
> sequence of e.g. loop transformations
> to apply (and dependences are the means to the transformations, not the
> goal).
>
> That is not to say that the framework you mentioned Michael is not great.
> It's just that AFAIU, the core
> ideas are different (which is great for pluralism :))
>
> IMHO, this framework certainly is great and in fact, it ties nicely with
> 3). I would argue that it is an important step for
> loop optimizations in LLVM whether it is later used for auto-tuning or not.
>
> (FWIW, I'm working on 3) from a different angle and hopefully, soon we'll
> be able to make the work public :))
>
> Best,
> Stefanos
>
> [1] www.youtube.com/watch?v=rZFeTTFp7x4
>
> Στις Τετ, 24 Φεβ 2021 στις 4:42 π.μ., ο/η Michael Kruse <
> llvmdev at meinersbur.de> έγραψε:
>
>> To add to Stefanos' list, I think autotuning would be another point
>> since at compile time it is unknown with which parameters a program is
>> invoked and cost heuristics as in 1) cannot model the entire
>> architecture. Ideally, reoptimization using collected information
>> during runtime would be done transparently by a JIT as in Chis
>> Lattner's original master's thesis.
>>
>> Stefanos' items 1)-3) would be possible, at least for loop nests,
>> using a framework that I outlined in [7].
>>
>> [9] https://llvm.org/pubs/2004-01-30-CGO-LLVM.html
>> [7] https://youtu.be/zHHUh0c5wig
>>
>>
>> Am Mo., 22. Feb. 2021 um 19:43 Uhr schrieb Stefanos Baziotis via
>> llvm-dev <llvm-dev at lists.llvm.org>:
>> >
>> > Hi Denis,
>> >
>> > Looking forward to your talk at LLVM-CGO!
>> >
>> > Here are some directions that I have seen lately:
>> >
>> > 1) "Unconstrained" Optimization
>> >
>> > Currently, optimization passes use a pre-determined series of steps.
>> So, optimizations are inherently constrained in how big leaps
>> > the transformations can make. On the other hand, research such as STOKE
>> [1] has showed that a "more dumb" but unconstrained
>> > optimizer can change radically even the very algorithm used. To explain
>> the "more dumb" but unconstrained part, the algorithm used to optimize
>> > the program is literally:
>> >
>> > - Start with a program (or no program, in which case the program is
>> synthesized)
>> >
>> > - Do a random change to the program
>> >   - Compute a cost (whose specifics deserve a big discussion but it's
>> not the central point here; the first pointer at the end is related though)
>> >   - If the cost is better, keep the change
>> >   - Otherwise, based on some probability, keep the change
>> >   - Repeat
>> >
>> > This resulted in great improvements to the program, in a not horrible
>> compilation time.
>> >
>> > 2) Automatic Parallelization Revival
>> >
>> > Automatic Parallelization is thought to have died, but in the last
>> couple of years a group in Princeton has shown some
>> > promising improvements, specifically with Perspective [2]. I think this
>> a great step forward as it obtained a _23.0x_ for
>> > 12 general-purpose C/C++ programs (SPEC IIRC) running on a 28-core
>> shared-memory commodity machine.
>> > I would urge you to take a closer look to that since the infrastructure
>> is built on top of LLVM.
>> >
>> > Here's some related work [3] trying to revive automatic parallelization
>> from a different perspective (pun not intended).
>> >
>> > 3) Decoupling Transformations and Cost-Modeling
>> >
>> > An important problem I think in today's compilers is that cost is baked
>> into the transformations (and it's
>> > not even clear how it is computed).
>> >
>> > The result of this is that even if you had a perfect oracle, which
>> always knew the perfect transformations to be done,
>> > there is simply no way to instruct the compiler to perform the
>> sequence. So, my personal opinion is that in
>> > the years to come, there will be an effort to separate transformations
>> into their own, dedicated and fine-grained
>> > modules (as opposed to the monolithic entities which now are, i.e.
>> passes). This in turn can enable machine-learning
>> > models (which will decide _what_ has to happen and then they'll use the
>> fine-grained APIs of transformations to make it happen).
>> >
>> > (I think this is closely related to what Mircea said above)
>> >
>> > --- Random pointers ---
>> >
>> > * The DeepCompiler [4] project at MIT has done significant improvements
>> in predicting the performance of X86 code:
>> > * Alex Aiken's opinion on the future of compilers [5]
>> >
>> > Disclaimer: This is definitely not an exhaustive list!
>> >
>> > [1] https://github.com/StanfordPL/stoke
>> > [2] https://liberty.princeton.edu/Projects/AutoPar/Perspective/
>> > [3] https://www.youtube.com/watch?v=8B25HQeJ0Ms
>> > [4] https://www.deep-compiler.org/
>> > [5] https://youtu.be/ob0nfNr2FLc?t=156
>> >
>> > Στις Τρί, 23 Φεβ 2021 στις 2:57 π.μ., ο/η Mircea Trofin via llvm-dev <
>> llvm-dev at lists.llvm.org> έγραψε:
>> >>
>> >>
>> >>
>> >> On Mon, Feb 22, 2021 at 4:50 PM Denis Bakhvalov via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>> >>>
>> >>> Hello,
>> >>>
>> >>> I'll be giving a short presentation on the LLVM performance workshop
>> soon and I want to touch on the topic of future performance improvements. I
>> decided to ask the community about what can drive performance improvements
>> in a classic C++ LLVM compiler CPU backend in the future? If I summarize
>> all the thoughts and opinions, I think it would be an interesting
>> discussion.
>> >>>
>> >>> There is already a body of research on the topic, including [1] which
>> talks about superoptimizers, but maybe anybody has some interesting new
>> ideas.
>> >>> In particular, I'm interested to hear thoughts on the following
>> things:
>> >>> 1. How big is the performance headroom in existing LLVM optimization
>> passes?
>> >>> 2. I think PGO can play a bigger role in the future. I see the
>> benefits of more optimizations being guided by profiling data. For example,
>> there is potential for intelligent injection of memory prefetching hints
>> based on HW telemetry data on modern Intel CPUs. This HW telemetry data
>> allows finding memory accesses that miss in caches and estimate the
>> prefetch window (in cycles). Using this data compiler can determine the
>> place for a prefetch hint. Obviously, there are lots of limitations, but
>> it's just a thought. BTW, the same can be done for PGO-driven
>> branch-to-cmov conversion (fighting branch mispredictions).
>> >>> 3. ML opportunities in compiler tooling. For example, code similarity
>> analysis [2][3] opens a wide range of opportunities, e.g. build a
>> recommendation system that will suggest a better performing code sequence.
>> >>
>> >> on this, also: replacing hand-crafted heuristics with machine learned
>> policies, for those passes that are heuristics driven - like inlining,
>> regalloc, instruction selection, etc. Same for cost models.
>> >>
>> >>>
>> >>> Please also share any thoughts you have that are not on this list.
>> >>>
>> >>> If that topic was discussed in the past, sorry, and please send links
>> to those discussions.
>> >>>
>> >>> -Denis
>> >>> https://easyperf.net
>> >>>
>> >>> [1]: https://arxiv.org/abs/1809.02161
>> >>> [2]: https://doi.org/10.1145/3360578
>> >>> [3]: https://arxiv.org/abs/2006.05265
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> LLVM Developers mailing list
>> >>> llvm-dev at lists.llvm.org
>> >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >>
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org
>> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > llvm-dev at lists.llvm.org
>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210224/2f4e4a09/attachment.html>