[llvm-dev] Questions about code-size optimizations in ARM backend

Wed Nov 8 06:00:59 PST 2017

Seeing that Momchil already has a patch in the Phabricator for the shift 
elimination I think I'm going to
proceed with the "pc" related addressing in ARMConstantIslands.

Thanks for the advice!

Best regards,
Gabor Ballabas

On 11/07/2017 09:08 PM, Friedman, Eli wrote:
> On 11/7/2017 9:02 AM, Gabor Ballabas wrote:
>>
>> Hi All,
>>
>> I started to work on code-size improvements on ARM target by 
>> comparing GCC and LLVM generated code.
>> My first candidate was switch-case lowering.
>> I also created a Bugzilla issue for this topic: 
>> https://bugs.llvm.org/show_bug.cgi?id=34902
>> The full example code and the generated assembly for GCC and for LLVM 
>> is in the Bugzilla issue.
>>
>> My first idea was to simplify the following instruction pattern
>> *lsl     r0, r0, #2**
>> **       ldr     pc, [r0, r1]*
>> to this:
>> *ldr     pc, [r1, r0, lsl #2]*
>>
>> but then I got really confused when I started to look into the 
>> machine-dependent optimization passes in the backend.
>>
>> I get a dump with the '-print-machineinstrs' option from the 
>> MachineFunctionPass and I can see these instructions in the beginning 
>> of the passes
>>
>> *%vreg2<def> = MOVsi %vreg1, 18, pred:14, pred:%noreg, opt:%noreg; 
>> GPR:%vreg2,%vreg1**
>> **    %vreg3<def> = LEApcrelJT <jt#0>, pred:14, pred:%noreg; GPR:%vreg3**
>> **    BR_JTm %vreg2<kill>, %vreg3<kill>, 0, <jt#0>; 
>> mem:LD4[JumpTable] GPR:%vreg2,%vreg3*
>>
>> and these at the end
>>
>> *%R0<def> = MOVsi %R0<kill>, 18, pred:14, pred:%noreg, opt:%noreg**
>> **    %R1<def> = LEApcrelJT <jt#0>, pred:14, pred:%noreg**
>> **    BR_JTm %R0<kill>, %R1<kill>, 0, <jt#0>; mem:LD4[JumpTable]*
>>
>
> "lsl r0, r0, #2" is an alias for "mov r0, r0, lsl #2", which is the 
> MachineInstr "MOVsi".
>
> LEApcrelJT and BR_JTm are pseudo-instructions which correspond to 
> "adr" and "ldr" respectively.  We use a special opcode for the 
> jump-table address because we have to do some extra work in 
> ARMConstantIslands for instructions which use constant pools.  We use 
> a special opcode for the load so we can mark it as a branch (which 
> matters for modeling the CFG).
>
>> So basically I want to catch the pattern with the possible 
>> simplification using the shifter,
>> but I'm not even sure that I am looking into this issue at the right 
>> optimization level.
>> Maybe this idea should be implemented in a higher level, or as a 
>> fixup in ARMConstantIslands,
>> like the Thumb jumptable optimizations mentioned in the Bugzilla issue.
>>
>> I hope someone more familiar with this part of the backend can give 
>> me some pointers about how to proceed with this idea
>> ( or why it is complete rubbish in the first place :) )
>>
>
> If you just want to pull the shift into the load, you can probably get 
> away with just messing with instruction selection for BR_JTm. There's 
> actually a FIXME in ARMInstrInfo.td which is relevant ("FIXME: This 
> shouldn't use the generic addrmode2, but rather be split into i12 and 
> rs suffixed versions.")
>
> If you want to do the fancy version where "pc" is part of the 
> addressing mode, you probably need to do something in 
> ARMConstantIslands (since the transform requires the jump table to be 
> placed directly after the jump.)
>
> -Eli
>
> -- 
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171108/61a50e03/attachment.html>