[llvm-commits] LLVM patch to optimize integer ABS idiom for ARM target

Fri Sep 30 21:43:30 PDT 2011

The patch looks fine to me. But please update the failing tests to match the new codegen. Feel free to get it committed after the update. 

Evan

On Sep 29, 2011, at 10:01 AM, Ana Pazos <apazos at codeaurora.org> wrote:

> Hi folks,
> 
>  
> 
> Find attached the updated patch that optimizes integer ABS idiom for the ARM target.
> 
>  
> 
> Let me know what you think.
> 
>  
> 
> Thank you!
> 
> Ana.
> 
>  
> 
> From: Evan Cheng [mailto:evan.cheng at apple.com] 
> Sent: Wednesday, September 28, 2011 5:20 PM
> To: Ana Pazos
> Cc: 'Eli Friedman'; rajav at codeaurora.org
> Subject: Re: [llvm-commits] LLVM patch to optimize integer ABS idiom for ARM target
> 
>  
> 
> Hi Ana,
> 
>  
> 
> The patch looks much cleaner. There are some minor stylistic issues though:
> 
>  
> 
> +     BuildMI(*SinkBB, SinkBB->begin(), dl,
> 
> +          TII->get(ARM::PHI), ABSDstReg)
> 
> +    .addReg(NewRsbDstReg).addMBB(RSBBB)
> 
> +    .addReg(NewMovDstReg).addMBB(BB);
> 
>  
> 
> Indentation.
> 
>  
> 
> +   if (DisableARMIntABS){
> 
> +      return NULL;
> 
> +   }
> 
>  
> 
> Please make sure there is a space before '{'. Also note in general LLVM code doesn't use { } for single statement if - else clauses.
> 
>  
> 
> Please send the patch to llvm-commits for larger audience. Thanks.
> 
>  
> 
> Evan
> 
>  
> 
> On Sep 27, 2011, at 6:01 PM, Ana Pazos wrote:
> 
> 
> 
> 
> Hello Evan,
> 
>  
> 
> I have updated the patch following your comments 1-4 and suggestions 1-2.
> 
>  
> 
> I have defined separate ARM ABS and t2ABS nodes and have moved the lowering of these nodes (to movs + rsbmi machine instructions) to pre-RA scheduling time.
> 
> The updated patch only modifies ARM target-dependent files.
> 
>  
> 
> The "bonuses" features imply changes to LLVM target-independent code which I am not able to address right away.
> 
>  
> 
> I have also rerun the llvm/test and llvm/projects/test-suite tests and the results are attached.
> 
>  
> 
> Please let me know what you think and if further changes are needed.
> 
>  
> 
> Note:
> 
> - The DAGCombiner lowers SELECT_CC as ASR/ADD/XOR instruction BEFORE each target has a chance to choose its preferred idiom.
> 
> The code change in ARMISelDAGToDAG.cpp is to convert ASR/ADD/XOR representing integer ABS into an ARM ABS/t2ABS node.
> 
> This was done to avoid changing LLVM target-independent code.
> 
>  
> 
> - ABS nodes take a source operand and a destination operand (V1 = ABS V0).
> 
> The comparison against 0 and the predicated computation of the absolute value are implicit in the node.
> 
> That is why the ABS nodes are declared as defining CPSR.
> 
>  
> 
> Thanks a lot!
> 
> Ana.
> 
>  
> 
> From: Evan Cheng [mailto:evan.cheng at apple.com] 
> Sent: Sunday, September 18, 2011 11:22 AM
> To: Ana Pazos
> Cc: rajav at codeaurora.org
> Subject: Re: [llvm-commits] LLVM patch to optimize integer ABS idiom for ARM target
> 
>  
> 
>  
> 
> On Sep 16, 2011, at 6:04 PM, Ana Pazos wrote:
> 
> 
> 
> 
> 
> Hi Evan,
> 
>  
> 
> Thanks for the great feedback.
> 
>  
> 
> I think I can immediately improve the patch by addressing  your points 1-4 and suggestions 1-2 and deliver these changes soon.
> 
>  
> 
> The “bonuses” suggestions imply changes to LLVM target-independent code which I might not be able to address right away.
> 
>  
> 
> Would this plan be ok with you?
> 
>  
> 
> Yes. Thanks.
> 
> 
> 
> 
> 
>  
> 
> Please see my  comments below.
> 
>  
> 
> Talk to you again soon!
> 
>  
> 
> Thanks a lot,
> 
> Ana.
> 
>  
> 
> From: Evan Cheng [mailto:evan.cheng at apple.com] 
> Sent: Friday, September 16, 2011 11:32 AM
> To: Ana Pazos
> Cc: Commit Messages and Patches for LLVM
> Subject: Re: [llvm-commits] LLVM patch to optimize integer ABS idiom for ARM target
> 
>  
> 
> Hi Ana,
> 
>  
> 
> Thanks for working on this. It does look like an important peephole optimization. Unfortunately I think this patch has some problems.
> 
>  
> 
> 1. The instruction selection pattern is looking for a very specific case. Does it still work if the source is not a function argument? For example, if it's the return value of a function call, or a result of a computation, I don't think it will work with this patch. 
> 
> [apazos] Yes, it still works if the source is not a function argument. Maybe the test case confused you.
> 
> The ASR/ADD/XOR pattern that implements integer ABS  is searched for in the DAG and can happen in any place in the function code.
> 
> Once it is found, it is replaced with the pseudoABS node using as source  operand the ADD instruction first source operand (which is the same as SRA instruction first source operand).
> 
> Later when pseudoABS node is lowered, it becomes movs/conditional rsbmi using as source and destination register the pseudoABS instruction destination register.
> 
> So there is nothing in the patch that restricts the operand to be “r0”.
> 
>  
> 
> Ok. I'm just concerned about looking for specific patterns could be fragile which is why I suggest adding a ABS isel opcode.
> 
>  
> 
> 
> 
> 
> 
>  
> 
> 2. MOVrs and RSBccri are not needed. You can expand into MOVr with the optional def set to CPSR and a RSBri with the predicate operand filled in.
> 
> [apazos] Probably my inexperience with *.td file definitions. I will find out how to use the existing MOVr and RSBri instruction definitions.
> 
>  
> 
> Ok.
> 
> 
> 
> 
> 
>  
> 
> 3. We want to avoid pseudo instructions that expands into multiple instructions. As you have noticed, this messes up scheduling. Probably the right solution is to expand the instructions at pre-RA scheduling time.
> 
> [apazos] I understand why you want to avoid pseudo instructions that expand into multiple instructions. It was my concern too.
> 
> I have a version of the patch that implements the optimization at  pre-RA scheduling time.
> 
> For that, I had to add a PHI node in the DAG (to be able to add the conditional execution path that computes RSB).
> 
> The problem with this alternative solution is that LLVM decided to spill both input operands to the PHI nodes, and the final code was very inefficient.
> 
> I was not using the latest LLVM source code. I will try this solution again with the LLVM svn tip. Will let you know if I see the problem again.
> 
>  
> 
> Ok.
> 
>  
> 
>  
> 
> 4. It's a bad idea to have a single instruction that's used for both ARM and Thumb2 mode.
> 
> [apazos] There is nothing special we need to do handle Thumb2 mode.
> 
> I built EEMBC benchmarks in Thumb2 mode and the ABS optimization worked just fine.
> 
> Note that in Thumb2 mode conditional execution is implemented with IT instructions.
> 
> There is no need to explicitly write IT instructions though.
> 
> The assembler/code emitter inserts it for you where necessary changing the pattern movs rsbmi into movs itmi rsbmi.
> 
> So I am not sure we need t2ABSri type of node.
> 
>  
> 
> You still need pseudo nodes for ARM and Thumb2. We do not allow two modes to share an opcode.
> 
> 
> 
> 
> 
>  
> 
>  
> 
> My suggestion:
> 
> 1. ISel should match to an instruction (say ABS, t2ABSri) that's marked with usesCustomInserter = 1.
> 
> 2. Add support to lower ABS to MOVr and RSBri with optional def and predicate operands filled in. Please do the same for the Thumb2 variant.
> 
> [apazos] I think I understood what you meant. Let me give it a try.
> 
>  
> 
> Bonuses:
> 
> 1. Is this sufficient to generate the best code sequence? e.g.
> 
>         movs    r0, r0
> 
>         rsbmi   r0, r0, #0
> 
>  
> 
> This is the best if r0 is a function input. But how about?
> 
>         add       r0, r1
> 
>         movs    r0, r0
> 
>         rsbmi   r0, r0, #0
> 
>  
> 
> Shouldn't we copy propagate the movs?
> 
>         adds     r0, r1
> 
>         rsbmi   r0, r0, #0
> 
>  
> 
> One possibility is the scheduler custom expansion code look for the instruction that defines the source and tack the optional def on that instruction. Any other ideas?
> 
>  
> 
> 2. How to make the instruction selection code match more cases? My suggestion is to add a new target independent opcode ABS. For targets where this node is legal, i.e. ARM, dag combine can form this instruction rather than the sra + sub sequence. This way, you can write a simple pattern to match the instruction instead of C++ selection code.
> 
>  
> 
> [apazos] So instead of ARM/Thumb-specific DAG nodes create a new target independent ABS DAG node…
> 
> SELECT_CC DAG nodes that represent integer ABS patterns are transformed into target independent ABS DAG nodes.
> 
> And then ARM/Thumb target lowers it as movs/rsbmi and the other targets lower it as sra + add + xor sequence.
> 
>  
> 
> No, other targets will not generate ABS dag nodes since it will not be legal.
> 
> 
> 
> 
> 
> The problem with this solution is that it will imply changes to target-independent LLVM code.
> 
> In my current solution I strived to not change any target-independent LLVM code.
> 
>  
> 
> I believe this is a less fragile solution.
> 
> 
> 
> 
> 
>  
> 
> [apazos] Let me give more thought to the 2 bonuses suggestions above.
> 
>  
> 
> [apazos] But immediately I can address the other points.
> 
>  
> 
>  
> 
> Ok. Thanks.
> 
>  
> 
> Evan
> 
> 
> 
> 
> 
>  
> 
> Evan
> 
>  
> 
> On Sep 15, 2011, at 5:32 PM, Ana Pazos wrote:
> 
> 
> 
> 
> 
> 
> Hello,
> 
>  
> 
> I worked on an LLVM patch to optimize integer ABS idiom for the ARM target and would like to submit it to your review.
> 
>  
> 
> I experimented with EEMBC benchmarks, in particular MPEG encoding, and noted integer ABS computation happens frequently. Significant speed up was achieved with the  optimized idiom for ARM (20% for MPEG encoding).
> 
>  
> 
> Patch details:
> 
>  
> 
> LLVM lowers SELECT_CC nodes that represent an integer ABS pattern into ASR/ADD/XOR instructions.
> 
>  
> 
> It is possible to create an optimized machine idiom for integer ABS on ARM formed by MOVs/RSBmi predicated instructions.
> 
>  
> 
> This patch modifies ARM-specific files to implement the above optimized machine idiom for integer ABS.
> 
>  
> 
> Generation of the optimized integer ABS idiom is turned on by default. To turn this feature off set -disable-arm-int-abs feature flag.
> 
>  
> 
> abspatch.diff
> 
> Changes to ARM-specific files to implement optimized integer ABS idiom.
> 
>  
> 
> abstestpatch.diff
> 
> ARM/iabs.ll and Thumb/iabs.ll tests check for the non-optimized integer ABS idiom (ASR/ADD/XOR).
> 
> When applying abspatch.diff these tests fail. So patched the tests to set -disable-arm-int-abs flag to prevent the compiler from generating optimized integer ABS pattern and allow the non-optimized idiom to be checked.
> 
>  
> 
> iabsopt.ll
> 
> Similar to ARM/iabs.ll and Thumb/iabs.ll tests except that it checks for the optimized integer ABS idiom and checks for all possible test conditions.
> 
>  
> 
> failures.txt
> 
> Failure report and explanation from running llvm/test and projects/test-suite on ARM.
> 
> I noted failures running llvm/test (svn version 139318) and projects/test-suite (svn revision 139319) on ARM. Are these failures expected?
> 
>  
> 
> Thank you,
> 
> Ana.
> 
>  
> 
> <abspatch.diff><abstestpatch.diff><failures.txt><iabsopt.ll>_______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
>  
> 
>  
> 
> <abspatch.diff><abstestpatch.diff><failures.txt><iabsopt.ll>
> 
>  
> 
> <abspatch.diff>
> <abstestpatch.diff>
> <failures.txt>
> <iabsopt.ll>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20110930/41bb585f/attachment.html>