[llvm-dev] top-down vs. bottom-up list scheduling

Tue Nov 13 07:37:02 PST 2018

>
> Hi Matthias,

> In my experience bottom-up works better when optimizing for register
pressure while top-down tends to work better when optimizing for
latencies/stall reduction.

That's a very interesting insight, do you mind sharing the intuition behind
it?

Thanks,
Alex

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of
> Matthias Braun via llvm-dev
> Sent: November-06-18 3:55 PM
> To: Sjoerd Meijer <Sjoerd.Meijer at arm.com>
> Cc: llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] top-down vs. bottom-up list scheduling
>
> Some comments:
> - It's all heuristics, so you may be lucky/unlycky in any configuration. I
> assume you know how to read -debug-only=machine-scheduler output and the
> GenericScheduler::tryCandidate() code to figure out how most decisions are
> made.
> - In my experience bottom-up works better when optimizing for register
> pressure while top-down tends to work better when optimizing for
> latencies/stall reduction.
> - The scheduler has a couple rules dealing with pressure. Hitting any of
> these limits will make it pick pressure reducing nodes more aggressively
> (look for tryPressure() in the tryCandidate() code)
>     - Hitting the target limits (number of available registers)
>     - Hitting the maximum of the region before scheduling
>     - Generally attempt to minimmize pressure (this has lower priority
> though)
>
> In the past I looked at some crypto code that also was carefully tuned by
> a human and that turned out to be very tricky for the heuristics to get
> right because most values had multiple users.
> At the time I had a patch that would revert all scheduling decisions if it
> turned out that the scheduler couldn't find a solution with better or equal
> register pressure. It wasn't received too enthusiastically and I ended up
> tweaking some heuristics instead. But we may start that discussion again if
> you cannot find a know to tune the heuristic for your case.
>
> - Matthias
>
> >
> > Hello List!
> >
> > I am looking at top-down vs. bottom-up list scheduling for simple(r)
> > in-order cores. First, for some context, below is a fairly
> > representative pseudo-code example of the sort of DSP-like codes I am
> looking at:
> >
> >  uint64_t foo(int *pA, int *pB, unsigned N, unsigned C) {
> >    uint64_t sum = 0;
> >    while (N-- > 0) {
> >      A1 = *pA++;
> >      A2 = *pA++;
> >      B1 = *pB++;
> >      B2 = *pB++;
> >      ...
> >      sum += ((uint64_t) A1 * B1) >> C;
> >      sum += ((uint64_t) A2 * B2) >> C;
> >      ...
> >    }
> >    return sum;
> >  }
> >
> > These kernels are very tight loops. In this sort of legacy codes it's
> > not uncommon that loops are manually tuned and unrolled with the
> > available registers in mind, so that all values just about fit in
> > registers. In the example above, we read from input streams A and B,
> > and do 2 multiply-accumulate operations.  The loop is unrolled 2 times
> > in this example, but 4, 8 or 16 would more realistic, resulting in very
> high register pressure.
> >
> > But legacy apps written in this way are not the only culprit; we see
> > that with
> > (aggressive) loop unrolling we can end up in exactly the same
> > situation: the loopunroller does exactly what it needs to do, but it
> > results in very high register pressure.
> >
> > Here's the (obvious) problem: all live values in these loops should
> > just about fit in registers, but suboptimal/wrong decisions are made
> > resulting in a lot of spills/reloads; the machine scheduler is making
> > the life of the register allocator very difficult. I am looking at
> > regressions of about 30 - 40% for more than a handful of kernels, and
> > am thus very interested in what I could do about this.
> >
> > The first observation is that it looks like we default to bottom-up
> scheduling.
> > Starting bottom-up, all these "sum" variables are scheduled first, and
> > after that the loads (this is simplifying things a bit). And thus it
> > looks like we create a lot of very long live-ranges, causing the
> problems mentioned above.
> > When instead we start top-down, the loads are picked up first, then
> > the MAC, and this repeated for all MACs. The result is a sequence LD,
> > MAC, LD, MAC, etc., which is let's say a sequence more in
> > program-order, also with shorter live-ranges. This does exactly what I
> > want for these sort of kernels, it generates exactly the code we want.
> >
> > While playing with this bottom-up/top-down order, it didn't take long
> > to see that the top-down approach is excellent for these kind of
> > codes, but not for some other benchmarks, and so solving this issue
> > not just a matter of defaulting to a new scheduling policy.
> >
> > I am thinking about prototyping an approach where we start with the
> > bottom-up approach (under a new misched option), but when a
> > register-pressure/live-range trashhold value is reached, we bail and
> fall back to the top-down approach.
> > This is my first rough idea, but I haven't looked at this problem for
> > very long. My first few data points is suggesting this might be
> > benificial, but I could be missing a lot here. And also I am sure that
> > I am looking at an old/classic problem here and I'm sure others have
> > looked at this problem before. Thus I am wondering if people have
> experiences/opinions on this.
> >
> > Cheers,
> > Sjoerd.
> > IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181113/cb17a094/attachment.html>