[PATCH] D129107: [BOLT][HUGIFY] adds huge pages support of PIE/no-PIE binaries

Mon Aug 8 00:14:40 PDT 2022

yavtuk marked an inline comment as done.
yavtuk added a comment.

@rafauler  Hi Rafael, let me know if you need more details

================
Comment at: bolt/lib/Rewrite/RewriteInstance.cpp:494-496
+  // Hugify: Additional huge page from left side
+  if (opts::Hugify)
+    NextAvailableAddress += BC->PageAlign;
----------------
rafauler wrote:
> Why is that needed?
It's needed due to HUGEPAGE allocation policy and also due to the bug for old kernels where dynamic loader doesn't take into account p_align field.
Dynamic loader allocates and maps the segments sequentially with 4KB addresses alignment. If we want to get HUGEPAGE from OS we have to have the address for page with 2MB alignment. For that, I add padding from left and right sides in order to exclude overlapping between segments.

================
Comment at: bolt/runtime/common.h:82
 extern "C" {
-void *memcpy(void *Dest, const void *Src, size_t Len) {
+void __attribute__((noinline)) *
+    memcpy(void *Dest, const void *Src, size_t Len) {
----------------
rafauler wrote:
> Why is that needed?
good question :-) the user-func-reoder test fails and it was hard to reproduce the cause locally
since it's related to compiler

with this attribute we have the following assembly for memcpy:

.Loop:
    ...
    movzbl (%rsi,%rdi,1),%ecx
    mov    %cl,(%rax,%rdi,1)
    add    $0x1,%rdi
    cmp    %rdi,%r9
    jne    a004a0 <_fini+0x2c4>
    ...

    mov    %r14,%rdi
    mov    %r15,%rsi
    mov    %rbx,%rdx
    callq  .Loop

copying is performed by byte with verification

without this attribute I see the following:

.Loop:
    ... 
    movzbl 0x0(%r13,%rax,1),%edx
    mov    %dl,(%rbx,%rax,1)
    movzbl 0x1(%r13,%rax,1),%edx
    mov    %dl,0x1(%rbx,%rax,1)
    movzbl 0x2(%r13,%rax,1),%edx
    mov    %dl,0x2(%rbx,%rax,1)
    movzbl 0x3(%r13,%rax,1),%edx
    mov    %dl,0x3(%rbx,%rax,1)
    movzbl 0x4(%r13,%rax,1),%edx
    mov    %dl,0x4(%rbx,%rax,1)
    movzbl 0x5(%r13,%rax,1),%edx
    mov    %dl,0x5(%rbx,%rax,1)
    movzbl 0x6(%r13,%rax,1),%edx
    mov    %dl,0x6(%rbx,%rax,1)
    movzbl 0x7(%r13,%rax,1),%edx
    mov    %dl,0x7(%rbx,%rax,1)
    add    $0x8,%rax
    cmp    %rax,%rcx
    jne    a007f0 <_fini+0x614>

copying is performed with unrolling and test fails due to overlapping dst and src addresses for size which is not aligned to 8 bytes

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129107/new/

https://reviews.llvm.org/D129107