[AArch64] FP load balancing pass for Cortex-A57
Renato Golin
renato.golin at linaro.org
Tue Aug 5 08:09:35 PDT 2014
On 5 August 2014 13:51, James Molloy <james.molloy at arm.com> wrote:
> For best-case performance on Cortex-A57, we should try to use a balanced mix
> of odd and even D-registers when performing a critical sequence of
> independent, non-quadword FP/ASIMD floating-point multiply or
> multiply-accumulate operations.
Hi James,
Great patch! Do you have some tests that could show us the
improvements and make sure we're not regressing in the future?
If I got it right, this pass will only check chains contained in the
same basic block. If that's correct, what would be the added
complexity of checking across multiple basic blocks?
I had a good read and the algorithm looks right, though I have to say
I didn't get everything with ultimate precision. It'd really be good
to have some tests showing current coverage, future planned coverage
and cases where we don't want to change due to conflicts.
I have some comments inline...
+ void add(MachineInstr *MI, unsigned Idx, Color C) {
+ LastInst = MI;
+ LastInstIdx = Idx;
+ Insts.insert(MI);
+
+ LastColor = C;
+ }
Weird formatting, insert(MI) as a last line, separated, would be more
meaningful.
+ /// Return the preferred color of this chain. Ties break to even.
+ Color getPreferredColor() {
+ if (OverrideBalance != 0)
+ return OverrideBalance == 1 ? Color::Even : Color::Odd;
+ return LastColor;
+ }
Is that comment about ties right? This just seem to return the forced
colour or the last colour... Unless the last colour breaks ties to
even...
+ // (a) That's rather heavyweight for only two colors.
+ // (b) We expect multiple disjoint interference regions - in
practice the live
+ // (c) range of chains is quite small and they are clustered between loads
+ // and stores.
the formatting is weird, making it hard to get the last two topics
+ std::sort(V.begin(), V.end(), [](const std::vector<Chain*> &A,
+ const std::vector<Chain*> &B) {
+ return A.front()->startsBefore(B.front());
+ });
weird formatting...
+ // Tie-break equivalent sizes by sorting chains requiring fixups before
+ // those without fixups.
I'd assume the opposite... focusing on simpler ones first, since...
+ // If we'll need a fixup FMOV, don't bother. Testing has shown that this
+ // happens infrequently and when it does it has at least a 50% chance of
+ // slowing code down instead of speeding it up.
and that's it. :)
cheers,
--renato
More information about the llvm-commits
mailing list