[PATCH] D51780: ARM: align loops to 4 bytes on Cortex-M3 and Cortex-M4.

Wed Sep 12 10:41:28 PDT 2018

dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

> The benchmarks came back as about a 0.2% difference in cycle count, and (crucially) there's no way when deciding function alignment to check for  OptSize so we'd inevitably pessimize some cases.

I think this what getPrefAlignment is for? As opposed to MinFunctionAlignment? I agree that that's a different issue though, and doesn't need to be done with this.

Although I personally think the definition of Os is a bit odd, and this is increasing codesize, everyone else I've talked to agreed with you that this is fine. Like you said, in (almost) all cases we get a performance win for the bytes we spend.

So LGTM!

Repository:
  rL LLVM

https://reviews.llvm.org/D51780