[llvm-dev] Questions about code-size optimizations in ARM backend
Gabor Ballabas via llvm-dev
llvm-dev at lists.llvm.org
Wed Nov 8 06:00:59 PST 2017
Seeing that Momchil already has a patch in the Phabricator for the shift
elimination I think I'm going to
proceed with the "pc" related addressing in ARMConstantIslands.
Thanks for the advice!
Best regards,
Gabor Ballabas
On 11/07/2017 09:08 PM, Friedman, Eli wrote:
> On 11/7/2017 9:02 AM, Gabor Ballabas wrote:
>>
>> Hi All,
>>
>> I started to work on code-size improvements on ARM target by
>> comparing GCC and LLVM generated code.
>> My first candidate was switch-case lowering.
>> I also created a Bugzilla issue for this topic:
>> https://bugs.llvm.org/show_bug.cgi?id=34902
>> The full example code and the generated assembly for GCC and for LLVM
>> is in the Bugzilla issue.
>>
>> My first idea was to simplify the following instruction pattern
>> *lsl r0, r0, #2**
>> ** ldr pc, [r0, r1]*
>> to this:
>> *ldr pc, [r1, r0, lsl #2]*
>>
>> but then I got really confused when I started to look into the
>> machine-dependent optimization passes in the backend.
>>
>> I get a dump with the '-print-machineinstrs' option from the
>> MachineFunctionPass and I can see these instructions in the beginning
>> of the passes
>>
>> *%vreg2<def> = MOVsi %vreg1, 18, pred:14, pred:%noreg, opt:%noreg;
>> GPR:%vreg2,%vreg1**
>> ** %vreg3<def> = LEApcrelJT <jt#0>, pred:14, pred:%noreg; GPR:%vreg3**
>> ** BR_JTm %vreg2<kill>, %vreg3<kill>, 0, <jt#0>;
>> mem:LD4[JumpTable] GPR:%vreg2,%vreg3*
>>
>> and these at the end
>>
>> *%R0<def> = MOVsi %R0<kill>, 18, pred:14, pred:%noreg, opt:%noreg**
>> ** %R1<def> = LEApcrelJT <jt#0>, pred:14, pred:%noreg**
>> ** BR_JTm %R0<kill>, %R1<kill>, 0, <jt#0>; mem:LD4[JumpTable]*
>>
>
> "lsl r0, r0, #2" is an alias for "mov r0, r0, lsl #2", which is the
> MachineInstr "MOVsi".
>
> LEApcrelJT and BR_JTm are pseudo-instructions which correspond to
> "adr" and "ldr" respectively. We use a special opcode for the
> jump-table address because we have to do some extra work in
> ARMConstantIslands for instructions which use constant pools. We use
> a special opcode for the load so we can mark it as a branch (which
> matters for modeling the CFG).
>
>> So basically I want to catch the pattern with the possible
>> simplification using the shifter,
>> but I'm not even sure that I am looking into this issue at the right
>> optimization level.
>> Maybe this idea should be implemented in a higher level, or as a
>> fixup in ARMConstantIslands,
>> like the Thumb jumptable optimizations mentioned in the Bugzilla issue.
>>
>> I hope someone more familiar with this part of the backend can give
>> me some pointers about how to proceed with this idea
>> ( or why it is complete rubbish in the first place :) )
>>
>
> If you just want to pull the shift into the load, you can probably get
> away with just messing with instruction selection for BR_JTm. There's
> actually a FIXME in ARMInstrInfo.td which is relevant ("FIXME: This
> shouldn't use the generic addrmode2, but rather be split into i12 and
> rs suffixed versions.")
>
> If you want to do the fancy version where "pc" is part of the
> addressing mode, you probably need to do something in
> ARMConstantIslands (since the transform requires the jump table to be
> placed directly after the jump.)
>
> -Eli
>
> --
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171108/61a50e03/attachment.html>
More information about the llvm-dev
mailing list