[PATCH] D51780: ARM: align loops to 4 bytes on Cortex-M3 and Cortex-M4.

Sun Sep 9 13:10:51 PDT 2018

dmgreen added a comment.

Good stuff. The downstream version of this we have uses Subtarget->isMClass() && Subtarget->hasThumb2(). (Dont ask me why it's downstream, I'm not sure. It's from before my time). I think you are right about the M7 though, the results there are just different, not necessarily better.

We should also be aligning functions?

And we should include the M33 and its derivatives (but like you said, not M0+/M23)

================
Comment at: llvm/lib/CodeGen/MachineBlockPlacement.cpp:2500
   // loop rotations done during this layout pass.
-  if (F->getFunction().optForSize())
+  if (F->getFunction().optForMinSize() ||
+      (F->getFunction().optForSize() && !TLI->alignLoopsWithOptSize()))
----------------
This seems like an odd change to make at Os. It, by definition, increases code size.

Like you said, this might be an llvm quirk though. Do you have a link to llvm's definition of -Os?

================
Comment at: llvm/lib/Target/ARM/ARM.td:946
 def : ProcessorModel<"cortex-m4", CortexM3Model,        [ARMv7em,
+                                                         ProcM3,
                                                          FeatureVFP4,
----------------
What do you think of using FeatureAlignBranchTargets or something like it?

================
Comment at: llvm/lib/Target/ARM/ARMISelLowering.cpp:1204

   setMinFunctionAlignment(Subtarget->isThumb() ? 1 : 2);
 }
----------------
Here with this gubbins may be a better place for setting the loop alignment.

Repository:
  rL LLVM

https://reviews.llvm.org/D51780