[LLVMdev] Question on Machine Combiner Pass

Wed Feb 4 17:33:07 PST 2015

----- Original Message -----
> From: "Mandeep Singh Grang" <mgrang at codeaurora.org>
> To: llvmdev at cs.uiuc.edu
> Sent: Wednesday, February 4, 2015 12:58:27 PM
> Subject: Re: [LLVMdev] Question on Machine Combiner Pass
> 
> 
> 
> 
> 
> Ping
> 
> From: Mandeep Singh Grang [mailto:mgrang at codeaurora.org]
> Sent: Tuesday, February 03, 2015 4:34 PM
> To: 'llvmdev at cs.uiuc.edu'
> Cc: 'ghoflehner at apple.com'; 'apazos at codeaurora.org';
> mgrang at codeaurora.org
> Subject: Question on Machine Combiner Pass
> 
> 
> 
> Hi,
> 
> 
> 
> In the file lib/CodeGen/MachineCombiner.cpp I see that in the
> function MachineCombiner::preservesCriticalPathLen
> we try to determine whether the new combined instruction lengthens
> the critical path or not.
> 
> 
> 
> In order to do this we compute the depth and latency for the current
> instruction (MUL+ADD) and the alternate instruction (MADD).
> 
> But we call two different set of APIs for the current and new
> instructions:
> 
> 
> 
> For new instruction we use:
> 
> unsigned NewRootDepth = getDepth(InsInstrs, InstrIdxForVirtReg,
> BlockTrace);
> 
> unsigned NewRootLatency = getLatency(Root, NewRoot, BlockTrace);
> 
> 
> 
> While for the current instruction we use:
> 
> unsigned RootDepth = BlockTrace.getInstrCycles(Root).Depth;
> 
> unsigned RootLatency = TSchedModel.computeInstrLatency(Root);
> 

The BlockTrace comes from MachineTraceMetrics, which is an analysis pass, and thus might have its data pre-computed. This only strictly applies to the current instruction sequence. We need to use a different method to compute information for potential new instructions. Are you finding that these methods compute inconsistent results?

 -Hal

> 
> 
> This is related to the following commit:
> 
> commit e4fa341dde3c9521b7f11bd53ecdcbeb3f8fcbda
> 
> Author: Gerolf Hoflehner < ghoflehner at apple.com >
> 
> Date: Thu Aug 7 21:40:58 2014 +0000
> 
> MachineCombiner Pass for selecting faster instruction sequence on
> AArch64
> 
> 
> For this example code sequence:
> 
> %mul = mul nuw nsw i32 %conv2, %conv
> 
> %mul7 = mul nuw nsw i32 %conv6, %conv4
> 
> %add = add nuw nsw i32 %mul7, %mul
> 
> ret i32 %add
> 
> 
> 
> We generate the following assembly:
> mul w8, w0, w1
> 
> mul w9, w2, w3
> 
> add w0, w9, w8
> 
> ret
> 
> 
> 
> Whereas I expected the MUL+ADD to be combined to MADD otherwise I see
> degraded performance in several of my tests.
> 
> Could someone please explain why we use two different APIs to compute
> depth and latency for the two instructions?
> 
> Also if I try to use the same APIs for both then the depth for the
> NewRoot is 0.
> 
> 
> 
> Thanks,
> 
> Mandeep
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory