<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Seeing that Momchil already has a patch in the Phabricator for the

    shift elimination I think I'm going to<br>

    proceed with the "pc" related addressing in ARMConstantIslands.<br>

    <br>

    Thanks for the advice!<br>

    <br>

    Best regards,<br>

    Gabor Ballabas<br>

     <br>

    <div class="moz-cite-prefix">On 11/07/2017 09:08 PM, Friedman, Eli

      wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:13295982-bca7-0e71-2505-5e501dcf9731@codeaurora.org">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      <div class="moz-cite-prefix">On 11/7/2017 9:02 AM, Gabor Ballabas

        wrote:<br>

      </div>

      <blockquote type="cite"

        cite="mid:a941995e-dba8-5e11-13cd-607bd44ee8f6@inf.u-szeged.hu">

        <meta http-equiv="content-type" content="text/html;

          charset=utf-8">

        <p>Hi All,<br>

        </p>

        <p>I started to work on code-size improvements on ARM target by

          comparing GCC and LLVM generated code.<br>

          My first candidate was switch-case lowering.<br>

          I also created a Bugzilla issue for this topic: <a

            class="moz-txt-link-freetext"

            href="https://bugs.llvm.org/show_bug.cgi?id=34902"

            moz-do-not-send="true">https://bugs.llvm.org/show_bug.cgi?id=34902</a><br>

          The full example code and the generated assembly for GCC and

          for LLVM is in the Bugzilla issue.<br>

          <br>

          My first idea was to simplify the following instruction

          pattern<br>

                  <b>lsl     r0, r0, #2</b><b><br>

          </b><b>       ldr     pc, [r0, r1]</b><br>

          to this:<br>

                  <b>ldr     pc, [r1, r0, lsl #2]</b><br>

          <br>

          but then I got really confused when I started to look into the

          machine-dependent optimization passes in the backend.<br>

          <br>

          I get a dump with the '-print-machineinstrs' option from the

          MachineFunctionPass and I can see these instructions in the

          beginning of the passes<br>

          <br>

              <b>%vreg2<def> = MOVsi %vreg1, 18, pred:14,

            pred:%noreg, opt:%noreg; GPR:%vreg2,%vreg1</b><b><br>

          </b><b>    %vreg3<def> = LEApcrelJT <jt#0>,

            pred:14, pred:%noreg; GPR:%vreg3</b><b><br>

          </b><b>    BR_JTm %vreg2<kill>, %vreg3<kill>, 0,

            <jt#0>; mem:LD4[JumpTable] GPR:%vreg2,%vreg3</b><br>

          <br>

          and these at the end<br>

          <br>

              <b>%R0<def> = MOVsi %R0<kill>, 18, pred:14,

            pred:%noreg, opt:%noreg</b><b><br>

          </b><b>    %R1<def> = LEApcrelJT <jt#0>, pred:14,

            pred:%noreg</b><b><br>

          </b><b>    BR_JTm %R0<kill>, %R1<kill>, 0,

            <jt#0>; mem:LD4[JumpTable]</b><br>

        </p>

      </blockquote>

      <br>

      "lsl r0, r0, #2" is an alias for "mov r0, r0, lsl #2", which is

      the MachineInstr "MOVsi".<br>

      <br>

      LEApcrelJT and BR_JTm are pseudo-instructions which correspond to

      "adr" and "ldr" respectively.  We use a special opcode for the

      jump-table address because we have to do some extra work in

      ARMConstantIslands for instructions which use constant pools.  We

      use a special opcode for the load so we can mark it as a branch

      (which matters for modeling the CFG).<br>

      <br>

      <blockquote type="cite"

        cite="mid:a941995e-dba8-5e11-13cd-607bd44ee8f6@inf.u-szeged.hu">

        <p> So basically I want to catch the pattern with the possible

          simplification using the shifter,<br>

          but I'm not even sure that I am looking into this issue at the

          right optimization level.<br>

          Maybe this idea should be implemented in a higher level, or as

          a fixup in ARMConstantIslands,<br>

          like the Thumb jumptable optimizations mentioned in the

          Bugzilla issue.<br>

          <br>

          I hope someone more familiar with this part of the backend can

          give me some pointers about how to proceed with this idea<br>

          ( or why it is complete rubbish in the first place :) )<br>

          <br>

        </p>

      </blockquote>

      <br>

      If you just want to pull the shift into the load, you can probably

      get away with just messing with instruction selection for BR_JTm. 

      There's actually a FIXME in ARMInstrInfo.td which is relevant

      ("FIXME: This shouldn't use the generic addrmode2, but rather be

      split into i12 and rs suffixed versions.")<br>

      <br>

      If you want to do the fancy version where "pc" is part of the

      addressing mode, you probably need to do something in

      ARMConstantIslands (since the transform requires the jump table to

      be placed directly after the jump.)<br>

      <p>-Eli<br>

      </p>

      <pre class="moz-signature" cols="72">-- 

Employee of Qualcomm Innovation Center, Inc.

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project</pre>

    </blockquote>

    <br>

  </body>

</html>