[PATCH] D51780: ARM: align loops to 4 bytes on Cortex-M3 and Cortex-M4.

Mon Sep 10 05:09:22 PDT 2018

t.p.northover added a comment.

> We should also be aligning functions?

Interesting question. There are a couple of reasons to think the tradeoffs are different there. First, I'd probably (based purely on intuition rather than data) expect a loop to be executed more times than most functions. Second, when r7 is the FP, a function is almost guaranteed to start with a 16-bit instruction (`push {..., r7, lr}`).

I do have a captive embedded developer who cares very much about individual cycles at the moment though, so I'll ask him to benchmark it.

Anyway, thanks for commenting. I'll get started on the obvious changes.

================
Comment at: llvm/lib/CodeGen/MachineBlockPlacement.cpp:2500
   // loop rotations done during this layout pass.
-  if (F->getFunction().optForSize())
+  if (F->getFunction().optForMinSize() ||
+      (F->getFunction().optForSize() && !TLI->alignLoopsWithOptSize()))
----------------
dmgreen wrote:
> This seems like an odd change to make at Os. It, by definition, increases code size.
> 
> Like you said, this might be an llvm quirk though. Do you have a link to llvm's definition of -Os?
It's mostly in old threads, since our manuals are not great and a bit generic:

http://clang-developers.42468.n3.nabble.com/RFC-Codifying-but-not-formalizing-the-optimization-levels-in-LLVM-and-Clang-tt4029685.html#a4029741
http://clang-developers.42468.n3.nabble.com/Meaning-of-LLVM-optimization-levels-td4032493.html#a4032496

The first, particularly, is the understanding I've got.

================
Comment at: llvm/lib/Target/ARM/ARM.td:946
 def : ProcessorModel<"cortex-m4", CortexM3Model,        [ARMv7em,
+                                                         ProcM3,
                                                          FeatureVFP4,
----------------
dmgreen wrote:
> What do you think of using FeatureAlignBranchTargets or something like it?
An excellent idea.

================
Comment at: llvm/lib/Target/ARM/ARMISelLowering.cpp:1204

   setMinFunctionAlignment(Subtarget->isThumb() ? 1 : 2);
 }
----------------
dmgreen wrote:
> Here with this gubbins may be a better place for setting the loop alignment.
Yep.

Repository:
  rL LLVM

https://reviews.llvm.org/D51780