[LLVMdev] Question on Machine Combiner Pass
Hal Finkel
hfinkel at anl.gov
Wed Feb 4 17:33:07 PST 2015
----- Original Message -----
> From: "Mandeep Singh Grang" <mgrang at codeaurora.org>
> To: llvmdev at cs.uiuc.edu
> Sent: Wednesday, February 4, 2015 12:58:27 PM
> Subject: Re: [LLVMdev] Question on Machine Combiner Pass
>
>
>
>
>
> Ping
>
> From: Mandeep Singh Grang [mailto:mgrang at codeaurora.org]
> Sent: Tuesday, February 03, 2015 4:34 PM
> To: 'llvmdev at cs.uiuc.edu'
> Cc: 'ghoflehner at apple.com'; 'apazos at codeaurora.org';
> mgrang at codeaurora.org
> Subject: Question on Machine Combiner Pass
>
>
>
> Hi,
>
>
>
> In the file lib/CodeGen/MachineCombiner.cpp I see that in the
> function MachineCombiner::preservesCriticalPathLen
> we try to determine whether the new combined instruction lengthens
> the critical path or not.
>
>
>
> In order to do this we compute the depth and latency for the current
> instruction (MUL+ADD) and the alternate instruction (MADD).
>
> But we call two different set of APIs for the current and new
> instructions:
>
>
>
> For new instruction we use:
>
> unsigned NewRootDepth = getDepth(InsInstrs, InstrIdxForVirtReg,
> BlockTrace);
>
> unsigned NewRootLatency = getLatency(Root, NewRoot, BlockTrace);
>
>
>
> While for the current instruction we use:
>
> unsigned RootDepth = BlockTrace.getInstrCycles(Root).Depth;
>
> unsigned RootLatency = TSchedModel.computeInstrLatency(Root);
>
The BlockTrace comes from MachineTraceMetrics, which is an analysis pass, and thus might have its data pre-computed. This only strictly applies to the current instruction sequence. We need to use a different method to compute information for potential new instructions. Are you finding that these methods compute inconsistent results?
-Hal
>
>
> This is related to the following commit:
>
> commit e4fa341dde3c9521b7f11bd53ecdcbeb3f8fcbda
>
> Author: Gerolf Hoflehner < ghoflehner at apple.com >
>
> Date: Thu Aug 7 21:40:58 2014 +0000
>
> MachineCombiner Pass for selecting faster instruction sequence on
> AArch64
>
>
> For this example code sequence:
>
> %mul = mul nuw nsw i32 %conv2, %conv
>
> %mul7 = mul nuw nsw i32 %conv6, %conv4
>
> %add = add nuw nsw i32 %mul7, %mul
>
> ret i32 %add
>
>
>
> We generate the following assembly:
> mul w8, w0, w1
>
> mul w9, w2, w3
>
> add w0, w9, w8
>
> ret
>
>
>
> Whereas I expected the MUL+ADD to be combined to MADD otherwise I see
> degraded performance in several of my tests.
>
> Could someone please explain why we use two different APIs to compute
> depth and latency for the two instructions?
>
> Also if I try to use the same APIs for both then the depth for the
> NewRoot is 0.
>
>
>
> Thanks,
>
> Mandeep
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
More information about the llvm-dev
mailing list