[PATCH] D79916: Map -O to -O1 instead of -O2

Brandon Bergren via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Sep 17 11:10:50 PDT 2020


Bdragon28 added a comment.

In D79916#2279816 <https://reviews.llvm.org/D79916#2279816>, @jrtc27 wrote:

> In D79916#2279812 <https://reviews.llvm.org/D79916#2279812>, @Bdragon28 wrote:
>
>> In D79916#2279045 <https://reviews.llvm.org/D79916#2279045>, @jrtc27 wrote:
>>
>>> This has significantly regressed FreeBSD's performance with the new version of Clang. It seems Clang does not inline functions at -O1, unlike GCC, and since FreeBSD currently compiles its kernel with -O whenever debug symbols are enabled[1] (which, of course, is almost always true), this results in all its `static inline` helper functions not being inlined at all, a pattern that is common in the kernel, used for things like `get_curthread` and the atomics implementations.
>>>
>>> [1] This is a dubious decision made in r140400 in 2005 to provide "truer debugger stack traces" (well, before then there was ping-ponging between -O and -O2 based on concerns around correctness vs performance, but amd64 is an exception that has always used -O2 since r127180 it seems). Given that GCC will inline at -O, at least these days, the motivation seems to no longer exist, and compiling a kernel at anything other than -O2 (or maybe -O3) seems like a silly thing to do, but nevertheless it's what is currently done.
>>>
>>> Cc: @dim @trasz
>>
>> This is actually SUCH a bad idea that a kernel built with -O will *not work at all* on 32 bit powerpc platforms (presumably due to allocating stack frames in the middle of assembly fragments in the memory management that are supposed to be inlined at all times.) I had to hack kern.pre.mk to rquest -O2 at all times just to get a functioning kernel.
>
> Well, -O0, -O1, -O2 and -O should all produce working kernels, and any cases where they don't are compiler bugs (or kernel bugs if they rely on UB) that should be fixed, not worked around by tweaking the compiler flags in a fragile way until you get the behaviour relied on. Correctness and performance are very different issues here.

As an example:

  static __inline void
  mtsrin(vm_offset_t va, register_t value)
  {
  
          __asm __volatile ("mtsrin %0,%1; isync" :: "r"(value), "r"(va));
  }

This code is used in the mmu when bootstrapping the cpu like so:

  for (i = 0; i < 16; i++)
          mtsrin(i << ADDR_SR_SHFT, kernel_pmap->pm_sr[i]);
  powerpc_sync();
  
  sdr = (u_int)moea_pteg_table | (moea_pteg_mask >> 10);
  __asm __volatile("mtsdr1 %0" :: "r"(sdr));
  isync();
  
  tlbia();

During the loop there, we are in the middle of programming the MMU segment registers in real mode, and is supposed to be doing all work out of registers. (and powerpc_sync() and isync() should be expanded to their single assembly instruction, not a function call. The whole point of calling those is that we are in an inconsistent hardware state and need to sync up before continuing execution)

If there isn't a way to force inlining, we will have to change to using preprocessor macros in cpufunc.h.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79916/new/

https://reviews.llvm.org/D79916



More information about the cfe-commits mailing list