[PATCH] D129107: [BOLT][HUGIFY] adds huge pages support of PIE/no-PIE binaries

Rafael Auler via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Oct 27 14:56:25 PDT 2022


rafauler added a comment.

Thanks for working on this! Let's sync one last time our understanding of the implementation of hugifyForOldKernel. Sorry if being repetitive, but it is important now to be on the same page regarding what is happening during runtime in both PIC and no-PIC cases. See if you agree with me with respect to the AlignedFrom/AlignedTo/AlignedSize usages in the suggestions, and please point me any issues in my understanding.

If we look at hugifyForOldKernel(), the code suggested here currently copies only a part of the page that is determined by From, To.  We now know that "From", because of ASLR ignoring our alignment requirements, may not be aligned. Now suppose it lands in the middle of the page and that "To" (the end of hot code section) lands in the middle of the next page.

2MB huge page virtual memory map:

page1 - 0x400000:
hot start: 0x500000
page2 - 0x600000:
hot end:  0x700000
page 3 - 0x800000

In this case, according to lines 146-149, you will align "hot start" and "hot end"  to 0x400000 and 0x800000, respectively, and ask the kernel to unmap these pages. So you will be unmapping 4MB of code.  However, the code will be memcpy-ing 2MB of code from 0x500000 to 0x700000, and then copying it back after the kernel successfully mmaps the requested region into two huge pages.

Now, because you inserted extra padding in RewriteInstance:cpp:103 and 308, the fact that you are leaving 1MB before hot start not copied, and 1MB after hot end as well, is not really a problem.

However, in the non-PIC code, we are not inserting any extra padding. After hot end, at address 0x700000, we will have a large amount of code (coming from cold code of hot functions, those that were split). We will also have a bunch of extra code including the hugify runtime library itself, in some cases.

If we memcpy from 0x400000 to 0x800000 instead of the original 500000 to 700000, we will be erring on the safe side by always copying any memory contents that are being essentially erased after you ask the kernel to unmap them. That's why when using this mmap calls, we typically copy all page contents instead of just a subset of the (hot) bytes. It's also safe to reference these memory addresses (from 700000 to 800000) without the risk of segfaulting because BOLT will always pad the last code section in no-PIE -- the padding won't be correct for PIE because ASLR loader will misalign the start, but luckily we are inserting one extra page at the end in these cases, so the addresses from 700000 to 800000 will be filled with zeroes and won't segfault.

What you did in the last iteration was to expand hot_end towards one extra 4KB page, but that is not enough as in line 115 we are asking the kernel to unmap whole 2MB regions of text.

Does that sound reasonable? anything I'm missing?



================
Comment at: bolt/lib/Rewrite/RewriteInstance.cpp:102

----------------
Also add && opts::Hugify


================
Comment at: bolt/lib/Rewrite/RewriteInstance.cpp:306

----------------
Also add && opts::Hugify


================
Comment at: bolt/runtime/hugify.cpp:90
 
-static void hugify_for_old_kernel(uint8_t *from, uint8_t *to) {
-  size_t size = to - from;
+static void hugifyForOldKernel(uint8_t *From, uint8_t *To,
+                               uint8_t *AlignedFrom,
----------------
here we can pass AlignedFrom and Alignedto only


================
Comment at: bolt/runtime/hugify.cpp:97
+  uint8_t *Mem = reinterpret_cast<uint8_t *>(
+      __mmap(0, Size, 0x3 /* PROT_READ | PROT_WRITE */,
+             0x22 /* MAP_PRIVATE | MAP_ANONYMOUS */, -1, 0));
----------------
Here use AlignedSize


================
Comment at: bolt/runtime/hugify.cpp:111
+  // Copy the hot code to a temporary location.
+  memcpy(Mem, From, Size);
 
----------------
Here use AlignedFrom, AlignedSize


================
Comment at: bolt/runtime/hugify.cpp:115
   // Maps out the existing hot code.
-  if (__mmap(reinterpret_cast<uint64_t>(from), size,
-             PROT_READ | PROT_WRITE | PROT_EXEC,
-             MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1,
-             0) == (void *)MAP_FAILED) {
-    char msg[] = "failed to mmap memory for large page move terminating\n";
-    reportError(msg, sizeof(msg));
+  if (__mmap(reinterpret_cast<uint64_t>(AlignedFrom), AlignedSize,
+             0x3 /* PROT_READ | PROT_WRITE */,
----------------
...because here you are using Aligned Size


================
Comment at: bolt/runtime/hugify.cpp:144
+  uint8_t *HotEnd = (uint8_t *)&__hot_end + (Page4KB - 1);
+  HotEnd -= (intptr_t)HotEnd & (Page4KB - 1);
   // Make sure the start and end are aligned with huge page address
----------------
I'll suggest doing something else in line 97.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129107/new/

https://reviews.llvm.org/D129107



More information about the llvm-commits mailing list