<div dir="ltr"><div>I got a chance to look at this today in RamMap. Looks like they do have some crazy hack in the kernel to handle images specially: <a href="http://i.imgur.com/BeDov07.png">http://i.imgur.com/BeDov07.png</a><br></div><div>They have a special "image" flag column to indicate this.<br></div><div><br></div>-- Sean Silva<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Aug 14, 2015 at 1:19 AM, Sean Silva <span dir="ltr"><<a href="mailto:chisophugis@gmail.com" target="_blank">chisophugis@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span class="">On Thu, Aug 13, 2015 at 10:53 PM, Rui Ueyama <span dir="ltr"><<a href="mailto:ruiu@google.com" target="_blank">ruiu@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span>On Fri, Aug 14, 2015 at 8:04 AM, Sean Silva <span dir="ltr"><<a href="mailto:chisophugis@gmail.com" target="_blank">chisophugis@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span>On Thu, Aug 13, 2015 at 12:51 AM, Rui Ueyama <span dir="ltr"><<a href="mailto:ruiu@google.com" target="_blank">ruiu@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I think I do understand how the paging mechanism works. :) We are talking about different things. My question is why you think a file offset must be at a 4K boundary in order to map it efficiently to memory. To me you seems to be claiming that mmap(0, /*length*/4096, PROT_READ|PROT_EXEC, 0, SomeFD, /*file offset*/5120) is much inefficient than mmap(0, /*length*/4096, PROT_READ|PROT_EXEC, 0, SomeFD, /*file offset*/4096) because of the file offset of the former mmap call is not a multiple of 4096. And I'm saying that that's not true.</div></blockquote><div><br></div><div><br></div></span><div><div>The paging layer generally has the file already in physical memory before you ask it to map it. When it originally read the file off disk, how would it know what alignment the file should have in physical memory? The reality is that the alignment in the file is the alignment in physical memory. If the paging system had to be aware of having different parts of the file mapped in at arbitrary alignments it would be much more complicated (in practice, it will either be a hard error, or will force the kernel to explicitly make a copy, but the paging system won't do this automatically).</div></div></div></div></div></blockquote><div><br></div></span><div>The loader is able to (and I presume it does) read only the header of an executable first and then map each sections to memory. I don't think that the entire executables files are "generally" already mapped to memory.</div></div></div></div></blockquote><div><br></div></span><div>I say "generally" because the loader cannot know or not (layering violation in the kernel).</div><span class=""><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> Executable files are usually read only by the loader, and as long as the loader is consistent in how it maps each section to memory, no memcpy is needed.<br></div></div></div></div></blockquote><div><br></div></span><div>It would require a pretty serious layering violation for something as high-level as the loader to control at what offset modulo the page size a piece of a file is read into. It would require reaching through so many layers of the kernel in order for the loader to control that. In both Linux and FreeBSD (as examples of something in general), there is simply no API for mapping files that operates at sub-page granularity. In Linux vm_mmap literally rejects anything that is not page aligned (<a href="http://lxr.free-electrons.com/source/mm/util.c#L306" target="_blank">http://lxr.free-electrons.com/source/mm/util.c#L306</a>), and immediately calls into vm_mmap_pgoff which operates in terms of pages only. In FreeBSD, in the link I gave you can see that the copying process is external to the virtual memory system / fs -- the loader has to do it itself.</div><div><br></div><div>Like I said, it is possible for Windows to have a hack to do this. But it seems unlikely since fixing the linker is so much easier. I can believe that link.exe might set the FileAlignment to 512 bytes, but surely it must be actually page aligning the sections in the file (which is correct to do with a setting of FileAlignment == 512).</div><span class="HOEnZb"><font color="#888888"><div><br></div><div>-- Sean Silva</div></font></span><div><div class="h5"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><div><div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><div><br></div><div>Also, keep in mind that 512 byte "sector size" has almost nothing to do with how modern kernels/hardware do IO. It is a historical thing.</div><span><font color="#888888"><div><br></div><div>-- Sean Silva</div></font></span></div><div><div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Aug 13, 2015 at 4:10 PM, Sean Silva <span dir="ltr"><<a href="mailto:chisophugis@gmail.com" target="_blank">chisophugis@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span>On Wed, Aug 12, 2015 at 10:59 PM, Rui Ueyama <span dir="ltr"><<a href="mailto:ruiu@google.com" target="_blank">ruiu@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>What I don't understand is that why the offset from the beginning of the file must be multiple of page size in order to avoid full copy. Windows requires all sections to be aligned at least 4K in memory and 512 bytes on file, and I don't see any problem there.</div><div><br></div><div>Let's say we have two sections, A and B, whose sizes are 1024B and 4096B, respectively. We also assume that A's offset from the beginning of file is 4096, and B's 5120. The loader can map offset 4096 to 8192 of the file to some page, and 5120 to 9216 to other page. Why can't that?</div></div></blockquote><div><br></div></span><div>From the kernel's perspective of mapping memory (on x86), memory is divided into aligned 4K pieces. 5120 % 4096 == 1024, so in order to map it at an address that is 4K aligned, it must do a full memmove in order to move all the memory by 1024 bytes so that it is 4K aligned. This image maybe helps to understand how a 32-bit x86 CPU understands a virtual memory address:</div><div><a href="https://upload.wikimedia.org/wikipedia/commons/8/8e/X86_Paging_4K.svg" target="_blank">https://upload.wikimedia.org/wikipedia/commons/8/8e/X86_Paging_4K.svg</a></div><div><br></div><div>IIRC the resources I learned from are:</div><div><a href="http://duartes.org/gustavo/blog/post/how-the-kernel-manages-your-memory/" target="_blank">http://duartes.org/gustavo/blog/post/how-the-kernel-manages-your-memory/</a><br></div><div><a href="http://duartes.org/gustavo/blog/post/the-thing-king/" target="_blank">http://duartes.org/gustavo/blog/post/the-thing-king/</a><br></div><div>(that web page has many other very, *very* good posts. A list can be seen at: <a href="http://duartes.org/gustavo/blog/category/internals/" target="_blank">http://duartes.org/gustavo/blog/category/internals/</a>)</div><div><br></div><div>I think you will find that understanding virtual memory (and TLB) will greatly help you optimize LLD, since many operations in LLD have very high pressure on the virtual memory system.</div><span><font color="#888888"><div><br></div><div>-- Sean Silva</div></font></span><div><div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Aug 13, 2015 at 2:44 PM, Sean Silva <span dir="ltr"><<a href="mailto:chisophugis@gmail.com" target="_blank">chisophugis@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span>On Wed, Aug 12, 2015 at 7:17 PM, Rui Ueyama <span dir="ltr"><<a href="mailto:ruiu@google.com" target="_blank">ruiu@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I didn't test to see if this change ever has a negative impact on memory usage, but my guess is that's very unlikely because this layout is the same as what MSVC linker creates. If this is inefficient, virtually all Windows executables are being suffered by that, which is unlikely. My understanding is that the kernel maps each section separately to a memory address, so file offset of each section can be given independently from other sections.</div></blockquote><div><br></div></span><div>The mapping is done at the granularity of aligned 4K pages minimum (this is just how the x86 hardware page table mechanism works). A piece of the file cannot be moved by an amount that is not a multiple of 4K without a full copy.</div><div><br></div><div>The only way this could (in the usual case) not have a large overhead is for the kernel to do a crazy hack like have special paging semantics for files that are executables. This means that when LLD finishes working on a memory mapped file, if a section is not 4K aligned at least, then the kernel has to then do a copy to make the file conform the the actual memory layout it needs to have in the paging subsystem.</div><div><br></div><div>Or does windows make full copies of sections always? In other words processes don't share e.g. readonly text?</div><div><br></div><div>In ELF, the offset in the file and the offset in memory are required to be congruent modulo the alignment (see the documentation of p_align in <a href="http://www.sco.com/developers/gabi/latest/ch5.pheader.html" target="_blank">http://www.sco.com/developers/gabi/latest/ch5.pheader.html</a>), precisely to avoid the need to do crazy hacks like this when loading the program.</div><div><br></div><div>You can see that Linux will reject the binary:</div><div><a href="http://lxr.free-electrons.com/source/fs/binfmt_elf.c#L664" target="_blank">http://lxr.free-electrons.com/source/fs/binfmt_elf.c#L664</a> (load_elf_binary)<br></div><div>-> <a href="http://lxr.free-electrons.com/source/fs/binfmt_elf.c#L336" target="_blank">http://lxr.free-electrons.com/source/fs/binfmt_elf.c#L336</a> (elf_map)<br></div><div>-> <a href="http://lxr.free-electrons.com/source/mm/util.c#L306" target="_blank">http://lxr.free-electrons.com/source/mm/util.c#L306</a> (vm_mmap)</div><div>Notice:</div><div><div>312 if (unlikely(offset & ~PAGE_MASK))</div><div>313 return -EINVAL;</div></div><div><br></div><div>FreeBSD is more lenient, but you can see that the kernel does not like the situation when this is violated:</div><div><a href="http://src.illumos.org/source/xref/freebsd-head/sys/kern/imgact_elf.c#593" target="_blank">http://src.illumos.org/source/xref/freebsd-head/sys/kern/imgact_elf.c#593</a> (__elfN(load_file))<br></div><div>--> <a href="http://src.illumos.org/source/xref/freebsd-head/sys/kern/imgact_elf.c#467" target="_blank">http://src.illumos.org/source/xref/freebsd-head/sys/kern/imgact_elf.c#467</a> (__elfN(load_section))<br></div><div>--> <a href="http://src.illumos.org/source/xref/freebsd-head/sys/kern/imgact_elf.c#398" target="_blank">http://src.illumos.org/source/xref/freebsd-head/sys/kern/imgact_elf.c#398</a> (__elfN(map_insert))</div><div><div>423<span style="white-space:pre-wrap"> </span>/*</div><div>424<span style="white-space:pre-wrap"> </span> * The mapping is not page aligned. This means we have</div><div>425<span style="white-space:pre-wrap"> </span> * to copy the data. Sigh.</div><div>426<span style="white-space:pre-wrap"> </span> */</div></div><span><font color="#888888"><div><br></div><div>-- Sean Silva</div></font></span><div><div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div><br></div><div><div class="gmail_extra"><div class="gmail_quote">On Wed, Aug 12, 2015 at 5:31 PM, Sean Silva <span dir="ltr"><<a href="mailto:chisophugis@gmail.com" target="_blank">chisophugis@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span>On Tue, Aug 11, 2015 at 4:09 PM, Rui Ueyama via llvm-commits <span dir="ltr"><<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Author: ruiu<br>
Date: Tue Aug 11 18:09:00 2015<br>
New Revision: 244691<br>
<br>
URL: <a href="http://llvm.org/viewvc/llvm-project?rev=244691&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project?rev=244691&view=rev</a><br>
Log:<br>
COFF: Align sections to 512-byte boundaries on disk.<br>
<br>
Sections must start at page boundaries in memory, but they<br>
can be aligned to sector boundaries (512-bytes) on disk.<br>
We aligned them to 4096-byte boundaries even on disk, so we<br>
wasted disk space a bit.<br></blockquote><div><br></div></span><div>This will likely force the kernel to copy or otherwise do unnecessary work when loading. Are you sure that isn't happening? The kernel ideally wants to just create a couple page table entries. But if it needs to move things around at <4K granularity to make them properly aligned to their load address when loading (like this patch I think causes) then it will need to do copies.</div><div><br></div><div>This can likely be checked by looking for an increase in real memory usage for the system when the new binaries are loaded (vs. the old page-aligned ones), since the kernel will have a copy sitting in page cache and a copy for alignment mapped into the process address space; alternatively, you can check for the slowdown from the kernel copies when faulting the memory into the process's address space (or (less likely) it may do the copies eagerly which should be easy to measure too).</div><span><font color="#888888"><div><br></div><div>-- Sean Silva</div></font></span><div><div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Modified:<br>
lld/trunk/COFF/Writer.cpp<br>
lld/trunk/test/COFF/baserel.test<br>
lld/trunk/test/COFF/hello32.test<br>
<br>
Modified: lld/trunk/COFF/Writer.cpp<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/lld/trunk/COFF/Writer.cpp?rev=244691&r1=244690&r2=244691&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/lld/trunk/COFF/Writer.cpp?rev=244691&r1=244690&r2=244691&view=diff</a><br>
==============================================================================<br>
--- lld/trunk/COFF/Writer.cpp (original)<br>
+++ lld/trunk/COFF/Writer.cpp Tue Aug 11 18:09:00 2015<br>
@@ -37,8 +37,7 @@ using namespace lld;<br>
using namespace lld::coff;<br>
<br>
static const int PageSize = 4096;<br>
-static const int FileAlignment = 512;<br>
-static const int SectionAlignment = 4096;<br>
+static const int SectorSize = 512;<br>
static const int DOSStubSize = 64;<br>
static const int NumberfOfDataDirectory = 16;<br>
<br>
@@ -174,7 +173,7 @@ void OutputSection::addChunk(Chunk *C) {<br>
Off += C->getSize();<br>
Header.VirtualSize = Off;<br>
if (C->hasData())<br>
- Header.SizeOfRawData = RoundUpToAlignment(Off, FileAlignment);<br>
+ Header.SizeOfRawData = RoundUpToAlignment(Off, SectorSize);<br>
}<br>
<br>
void OutputSection::addPermissions(uint32_t C) {<br>
@@ -507,15 +506,14 @@ void Writer::createSymbolAndStringTable(<br>
// We position the symbol table to be adjacent to the end of the last section.<br>
uint64_t FileOff =<br>
LastSection->getFileOff() +<br>
- RoundUpToAlignment(LastSection->getRawSize(), FileAlignment);<br>
+ RoundUpToAlignment(LastSection->getRawSize(), SectorSize);<br>
if (!OutputSymtab.empty()) {<br>
PointerToSymbolTable = FileOff;<br>
FileOff += OutputSymtab.size() * sizeof(coff_symbol16);<br>
}<br>
if (!Strtab.empty())<br>
FileOff += Strtab.size() + 4;<br>
- FileSize = SizeOfHeaders +<br>
- RoundUpToAlignment(FileOff - SizeOfHeaders, FileAlignment);<br>
+ FileSize = RoundUpToAlignment(FileOff, SectorSize);<br>
}<br>
<br>
// Visits all sections to assign incremental, non-overlapping RVAs and<br>
@@ -526,9 +524,9 @@ void Writer::assignAddresses() {<br>
sizeof(coff_section) * OutputSections.size();<br>
SizeOfHeaders +=<br>
Config->is64() ? sizeof(pe32plus_header) : sizeof(pe32_header);<br>
- SizeOfHeaders = RoundUpToAlignment(SizeOfHeaders, PageSize);<br>
+ SizeOfHeaders = RoundUpToAlignment(SizeOfHeaders, SectorSize);<br>
uint64_t RVA = 0x1000; // The first page is kept unmapped.<br>
- uint64_t FileOff = SizeOfHeaders;<br>
+ FileSize = SizeOfHeaders;<br>
// Move DISCARDABLE (or non-memory-mapped) sections to the end of file because<br>
// the loader cannot handle holes.<br>
std::stable_partition(<br>
@@ -539,13 +537,11 @@ void Writer::assignAddresses() {<br>
if (Sec->getName() == ".reloc")<br>
addBaserels(Sec);<br>
Sec->setRVA(RVA);<br>
- Sec->setFileOffset(FileOff);<br>
+ Sec->setFileOffset(FileSize);<br>
RVA += RoundUpToAlignment(Sec->getVirtualSize(), PageSize);<br>
- FileOff += RoundUpToAlignment(Sec->getRawSize(), FileAlignment);<br>
+ FileSize += RoundUpToAlignment(Sec->getRawSize(), SectorSize);<br>
}<br>
SizeOfImage = SizeOfHeaders + RoundUpToAlignment(RVA - 0x1000, PageSize);<br>
- FileSize = SizeOfHeaders +<br>
- RoundUpToAlignment(FileOff - SizeOfHeaders, FileAlignment);<br>
}<br>
<br>
template <typename PEHeaderTy> void Writer::writeHeader() {<br>
@@ -584,8 +580,8 @@ template <typename PEHeaderTy> void Writ<br>
Buf += sizeof(*PE);<br>
PE->Magic = Config->is64() ? PE32Header::PE32_PLUS : PE32Header::PE32;<br>
PE->ImageBase = Config->ImageBase;<br>
- PE->SectionAlignment = SectionAlignment;<br>
- PE->FileAlignment = FileAlignment;<br>
+ PE->SectionAlignment = PageSize;<br>
+ PE->FileAlignment = SectorSize;<br>
PE->MajorImageVersion = Config->MajorImageVersion;<br>
PE->MinorImageVersion = Config->MinorImageVersion;<br>
PE->MajorOperatingSystemVersion = Config->MajorOSVersion;<br>
<br>
Modified: lld/trunk/test/COFF/baserel.test<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/lld/trunk/test/COFF/baserel.test?rev=244691&r1=244690&r2=244691&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/lld/trunk/test/COFF/baserel.test?rev=244691&r1=244690&r2=244691&view=diff</a><br>
==============================================================================<br>
--- lld/trunk/test/COFF/baserel.test (original)<br>
+++ lld/trunk/test/COFF/baserel.test Tue Aug 11 18:09:00 2015<br>
@@ -61,7 +61,7 @@<br>
# BASEREL-HEADER-NEXT: VirtualSize: 0x20<br>
# BASEREL-HEADER-NEXT: VirtualAddress: 0x5000<br>
# BASEREL-HEADER-NEXT: RawDataSize: 512<br>
-# BASEREL-HEADER-NEXT: PointerToRawData: 0x1800<br>
+# BASEREL-HEADER-NEXT: PointerToRawData: 0xC00<br>
# BASEREL-HEADER-NEXT: PointerToRelocations: 0x0<br>
# BASEREL-HEADER-NEXT: PointerToLineNumbers: 0x0<br>
# BASEREL-HEADER-NEXT: RelocationCount: 0<br>
<br>
Modified: lld/trunk/test/COFF/hello32.test<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/lld/trunk/test/COFF/hello32.test?rev=244691&r1=244690&r2=244691&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/lld/trunk/test/COFF/hello32.test?rev=244691&r1=244690&r2=244691&view=diff</a><br>
==============================================================================<br>
--- lld/trunk/test/COFF/hello32.test (original)<br>
+++ lld/trunk/test/COFF/hello32.test Tue Aug 11 18:09:00 2015<br>
@@ -38,8 +38,8 @@ HEADER-NEXT: MajorImageVersion: 0<br>
HEADER-NEXT: MinorImageVersion: 0<br>
HEADER-NEXT: MajorSubsystemVersion: 6<br>
HEADER-NEXT: MinorSubsystemVersion: 0<br>
-HEADER-NEXT: SizeOfImage: 20480<br>
-HEADER-NEXT: SizeOfHeaders: 4096<br>
+HEADER-NEXT: SizeOfImage: 16896<br>
+HEADER-NEXT: SizeOfHeaders: 512<br>
HEADER-NEXT: Subsystem: IMAGE_SUBSYSTEM_WINDOWS_CUI (0x3)<br>
HEADER-NEXT: Characteristics [ (0x8140)<br>
HEADER-NEXT: IMAGE_DLL_CHARACTERISTICS_DYNAMIC_BASE (0x40)<br>
<br>
<br>
_______________________________________________<br>
llvm-commits mailing list<br>
<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a><br>
</blockquote></div></div></div><br></div></div>
</blockquote></div><br></div></div></div></div></div></div>
</blockquote></div></div></div><br></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div></div></div><br></div></div>
</blockquote></div><br></div></div></div></div>
</blockquote></div></div></div><br></div></div>
</blockquote></div></div></div><br></div></div>
</blockquote></div></div></div><br></div></div>
</blockquote></div><br></div>