[LLVMdev] Question on Machine Combiner Pass

Tue Feb 3 16:34:19 PST 2015

Hi,

In the file lib/CodeGen/MachineCombiner.cpp I see that in the function
MachineCombiner::preservesCriticalPathLen 
we try to determine whether the new combined instruction lengthens the
critical path or not.

In order to do this we compute the depth and latency for the current
instruction (MUL+ADD) and the alternate instruction (MADD).

But we call two different set of APIs for the current and new instructions:

For new instruction we use:

unsigned NewRootDepth = getDepth(InsInstrs, InstrIdxForVirtReg, BlockTrace);

unsigned NewRootLatency = getLatency(Root, NewRoot, BlockTrace);

While for the current instruction we use:

unsigned RootDepth = BlockTrace.getInstrCycles(Root).Depth;

unsigned RootLatency = TSchedModel.computeInstrLatency(Root);

This has been introduced in the following commit:

commit e4fa341dde3c9521b7f11bd53ecdcbeb3f8fcbda

Author: Gerolf Hoflehner <ghoflehner at apple.com>

Date:   Thu Aug 7 21:40:58 2014 +0000

    MachineCombiner Pass for selecting faster instruction sequence on
AArch64

For this example code sequence:

  %mul = mul nuw nsw i32 %conv2, %conv

  %mul7 = mul nuw nsw i32 %conv6, %conv4

  %add = add nuw nsw i32 %mul7, %mul

  ret i32 %add

We generate the following assembly:
                mul        w8, w0, w1

                mul        w9, w2, w3

                add        w0, w9, w8

                ret

Whereas I expected the MUL+ADD to be combined to MADD otherwise I see
degraded performance in several of my tests.

Could someone please explain why we use two different APIs to compute depth
and latency for the two instructions?

Thanks,

Mandeep

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150203/694f8eab/attachment.html>