[PATCH] D129107: [BOLT][HUGIFY] adds huge pages support of PIE/no-PIE binaries
Alexey Moksyakov via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Aug 8 00:14:40 PDT 2022
yavtuk marked an inline comment as done.
yavtuk added a comment.
@rafauler Hi Rafael, let me know if you need more details
================
Comment at: bolt/lib/Rewrite/RewriteInstance.cpp:494-496
+ // Hugify: Additional huge page from left side
+ if (opts::Hugify)
+ NextAvailableAddress += BC->PageAlign;
----------------
rafauler wrote:
> Why is that needed?
It's needed due to HUGEPAGE allocation policy and also due to the bug for old kernels where dynamic loader doesn't take into account p_align field.
Dynamic loader allocates and maps the segments sequentially with 4KB addresses alignment. If we want to get HUGEPAGE from OS we have to have the address for page with 2MB alignment. For that, I add padding from left and right sides in order to exclude overlapping between segments.
================
Comment at: bolt/runtime/common.h:82
extern "C" {
-void *memcpy(void *Dest, const void *Src, size_t Len) {
+void __attribute__((noinline)) *
+ memcpy(void *Dest, const void *Src, size_t Len) {
----------------
rafauler wrote:
> Why is that needed?
good question :-) the user-func-reoder test fails and it was hard to reproduce the cause locally
since it's related to compiler
with this attribute we have the following assembly for memcpy:
.Loop:
...
movzbl (%rsi,%rdi,1),%ecx
mov %cl,(%rax,%rdi,1)
add $0x1,%rdi
cmp %rdi,%r9
jne a004a0 <_fini+0x2c4>
...
mov %r14,%rdi
mov %r15,%rsi
mov %rbx,%rdx
callq .Loop
copying is performed by byte with verification
without this attribute I see the following:
.Loop:
...
movzbl 0x0(%r13,%rax,1),%edx
mov %dl,(%rbx,%rax,1)
movzbl 0x1(%r13,%rax,1),%edx
mov %dl,0x1(%rbx,%rax,1)
movzbl 0x2(%r13,%rax,1),%edx
mov %dl,0x2(%rbx,%rax,1)
movzbl 0x3(%r13,%rax,1),%edx
mov %dl,0x3(%rbx,%rax,1)
movzbl 0x4(%r13,%rax,1),%edx
mov %dl,0x4(%rbx,%rax,1)
movzbl 0x5(%r13,%rax,1),%edx
mov %dl,0x5(%rbx,%rax,1)
movzbl 0x6(%r13,%rax,1),%edx
mov %dl,0x6(%rbx,%rax,1)
movzbl 0x7(%r13,%rax,1),%edx
mov %dl,0x7(%rbx,%rax,1)
add $0x8,%rax
cmp %rax,%rcx
jne a007f0 <_fini+0x614>
copying is performed with unrolling and test fails due to overlapping dst and src addresses for size which is not aligned to 8 bytes
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D129107/new/
https://reviews.llvm.org/D129107
More information about the llvm-commits
mailing list