[lld] r244691 - COFF: Align sections to 512-byte boundaries on disk.

Rui Ueyama via llvm-commits llvm-commits at lists.llvm.org
Wed Sep 9 20:49:41 PDT 2015


Thank you for doing this, but I cannot decipher these values myself. Does
that means the kernel does some crazy stuff for non-4K-aligned sections?

On Thu, Sep 10, 2015 at 11:17 AM, Sean Silva <chisophugis at gmail.com> wrote:

> I got a chance to look at this today in RamMap. Looks like they do have
> some crazy hack in the kernel to handle images specially:
> http://i.imgur.com/BeDov07.png
> They have a special "image" flag column to indicate this.
>
> -- Sean Silva
>
> On Fri, Aug 14, 2015 at 1:19 AM, Sean Silva <chisophugis at gmail.com> wrote:
>
>>
>>
>> On Thu, Aug 13, 2015 at 10:53 PM, Rui Ueyama <ruiu at google.com> wrote:
>>
>>> On Fri, Aug 14, 2015 at 8:04 AM, Sean Silva <chisophugis at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Thu, Aug 13, 2015 at 12:51 AM, Rui Ueyama <ruiu at google.com> wrote:
>>>>
>>>>> I think I do understand how the paging mechanism works. :)  We are
>>>>> talking about different things. My question is why you think a file offset
>>>>> must be at a 4K boundary in order to map it efficiently to memory. To me
>>>>> you seems to be claiming that mmap(0, /*length*/4096, PROT_READ|PROT_EXEC,
>>>>> 0, SomeFD, /*file offset*/5120) is much inefficient than mmap(0,
>>>>> /*length*/4096, PROT_READ|PROT_EXEC, 0, SomeFD, /*file
>>>>> offset*/4096) because of the file offset of the former mmap call is not a
>>>>> multiple of 4096. And I'm saying that that's not true.
>>>>>
>>>>
>>>>
>>>> The paging layer generally has the file already in physical memory
>>>> before you ask it to map it. When it originally read the file off disk, how
>>>> would it know what alignment the file should have in physical memory? The
>>>> reality is that the alignment in the file is the alignment in physical
>>>> memory. If the paging system had to be aware of having different parts of
>>>> the file mapped in at arbitrary alignments it would be much more
>>>> complicated (in practice, it will either be a hard error, or will force the
>>>> kernel to explicitly make a copy, but the paging system won't do this
>>>> automatically).
>>>>
>>>
>>> The loader is able to (and I presume it does) read only the header of an
>>> executable first and then map each sections to memory. I don't think that
>>> the entire executables files are "generally" already mapped to memory.
>>>
>>
>> I say "generally" because the loader cannot know or not (layering
>> violation in the kernel).
>>
>>
>>> Executable files are usually read only by the loader, and as long as the
>>> loader is consistent in how it maps each section to memory, no memcpy is
>>> needed.
>>>
>>
>> It would require a pretty serious layering violation for something as
>> high-level as the loader to control at what offset modulo the page size a
>> piece of a file is read into. It would require reaching through so many
>> layers of the kernel in order for the loader to control that. In both Linux
>> and FreeBSD (as examples of something in general), there is simply no API
>> for mapping files that operates at sub-page granularity. In Linux vm_mmap
>> literally rejects anything that is not page aligned (
>> http://lxr.free-electrons.com/source/mm/util.c#L306), and immediately
>> calls into vm_mmap_pgoff which operates in terms of pages only. In FreeBSD,
>> in the link I gave you can see that the copying process is external to the
>> virtual memory system / fs -- the loader has to do it itself.
>>
>> Like I said, it is possible for Windows to have a hack to do this. But it
>> seems unlikely since fixing the linker is so much easier. I can believe
>> that link.exe might set the FileAlignment to 512 bytes, but surely it must
>> be actually page aligning the sections in the file (which is correct to do
>> with a setting of FileAlignment == 512).
>>
>> -- Sean Silva
>>
>>
>>>
>>>
>>>> Also, keep in mind that 512 byte "sector size" has almost nothing to do
>>>> with how modern kernels/hardware do IO. It is a historical thing.
>>>>
>>>> -- Sean Silva
>>>>
>>>>
>>>>
>>>>>
>>>>> On Thu, Aug 13, 2015 at 4:10 PM, Sean Silva <chisophugis at gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 12, 2015 at 10:59 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>>>>
>>>>>>> What I don't understand is that why the offset from the beginning of
>>>>>>> the file must be multiple of page size in order to avoid full copy. Windows
>>>>>>> requires all sections to be aligned at least 4K in memory and 512 bytes on
>>>>>>> file, and I don't see any problem there.
>>>>>>>
>>>>>>> Let's say we have two sections, A and B, whose sizes are 1024B and
>>>>>>> 4096B, respectively. We also assume that A's offset from the beginning of
>>>>>>> file is 4096, and B's 5120. The loader can map offset 4096 to 8192 of the
>>>>>>> file to some page, and 5120 to 9216 to other page. Why can't that?
>>>>>>>
>>>>>>
>>>>>> From the kernel's perspective of mapping memory (on x86), memory is
>>>>>> divided into aligned 4K pieces. 5120 % 4096 == 1024, so in order to map it
>>>>>> at an address that is 4K aligned, it must do a full memmove in order to
>>>>>> move all the memory by 1024 bytes so that it is 4K aligned. This image
>>>>>> maybe helps to understand how a 32-bit x86 CPU understands a virtual memory
>>>>>> address:
>>>>>> https://upload.wikimedia.org/wikipedia/commons/8/8e/X86_Paging_4K.svg
>>>>>>
>>>>>> IIRC the resources I learned from are:
>>>>>>
>>>>>> http://duartes.org/gustavo/blog/post/how-the-kernel-manages-your-memory/
>>>>>> http://duartes.org/gustavo/blog/post/the-thing-king/
>>>>>> (that web page has many other very, *very* good posts. A list can be
>>>>>> seen at: http://duartes.org/gustavo/blog/category/internals/)
>>>>>>
>>>>>> I think you will find that understanding virtual memory (and TLB)
>>>>>> will greatly help you optimize LLD, since many operations in LLD have very
>>>>>> high pressure on the virtual memory system.
>>>>>>
>>>>>> -- Sean Silva
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> On Thu, Aug 13, 2015 at 2:44 PM, Sean Silva <chisophugis at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Aug 12, 2015 at 7:17 PM, Rui Ueyama <ruiu at google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I didn't test to see if this change ever has a negative impact on
>>>>>>>>> memory usage, but my guess is that's very unlikely because this layout is
>>>>>>>>> the same as what MSVC linker creates. If this is inefficient, virtually all
>>>>>>>>> Windows executables are being suffered by that, which is unlikely. My
>>>>>>>>> understanding is that the kernel maps each section separately to a memory
>>>>>>>>> address, so file offset of each section can be given independently from
>>>>>>>>> other sections.
>>>>>>>>>
>>>>>>>>
>>>>>>>> The mapping is done at the granularity of aligned 4K pages minimum
>>>>>>>> (this is just how the x86 hardware page table mechanism works). A piece of
>>>>>>>> the file cannot be moved by an amount that is not a multiple of 4K without
>>>>>>>> a full copy.
>>>>>>>>
>>>>>>>> The only way this could (in the usual case) not have a large
>>>>>>>> overhead is for the kernel to do a crazy hack like have special paging
>>>>>>>> semantics for files that are executables. This means that when LLD finishes
>>>>>>>> working on a memory mapped file, if a section is not 4K aligned at least,
>>>>>>>> then the kernel has to then do a copy to make the file conform the the
>>>>>>>> actual memory layout it needs to have in the paging subsystem.
>>>>>>>>
>>>>>>>> Or does windows make full copies of sections always? In other words
>>>>>>>> processes don't share e.g. readonly text?
>>>>>>>>
>>>>>>>> In ELF, the offset in the file and the offset in memory are
>>>>>>>> required to be congruent modulo the alignment (see the documentation of
>>>>>>>> p_align in
>>>>>>>> http://www.sco.com/developers/gabi/latest/ch5.pheader.html),
>>>>>>>> precisely to avoid the need to do crazy hacks like this when loading the
>>>>>>>> program.
>>>>>>>>
>>>>>>>> You can see that Linux will reject the binary:
>>>>>>>> http://lxr.free-electrons.com/source/fs/binfmt_elf.c#L664
>>>>>>>> (load_elf_binary)
>>>>>>>> -> http://lxr.free-electrons.com/source/fs/binfmt_elf.c#L336
>>>>>>>> (elf_map)
>>>>>>>> -> http://lxr.free-electrons.com/source/mm/util.c#L306 (vm_mmap)
>>>>>>>> Notice:
>>>>>>>> 312         if (unlikely(offset & ~PAGE_MASK))
>>>>>>>> 313                 return -EINVAL;
>>>>>>>>
>>>>>>>> FreeBSD is more lenient, but you can see that the kernel does not
>>>>>>>> like the situation when this is violated:
>>>>>>>>
>>>>>>>> http://src.illumos.org/source/xref/freebsd-head/sys/kern/imgact_elf.c#593
>>>>>>>> (__elfN(load_file))
>>>>>>>> -->
>>>>>>>> http://src.illumos.org/source/xref/freebsd-head/sys/kern/imgact_elf.c#467
>>>>>>>> (__elfN(load_section))
>>>>>>>> -->
>>>>>>>> http://src.illumos.org/source/xref/freebsd-head/sys/kern/imgact_elf.c#398
>>>>>>>> (__elfN(map_insert))
>>>>>>>> 423 /*
>>>>>>>> 424 * The mapping is not page aligned. This means we have
>>>>>>>> 425 * to copy the data. Sigh.
>>>>>>>> 426 */
>>>>>>>>
>>>>>>>> -- Sean Silva
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Aug 12, 2015 at 5:31 PM, Sean Silva <chisophugis at gmail.com
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Aug 11, 2015 at 4:09 PM, Rui Ueyama via llvm-commits <
>>>>>>>>>> llvm-commits at lists.llvm.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> Author: ruiu
>>>>>>>>>>> Date: Tue Aug 11 18:09:00 2015
>>>>>>>>>>> New Revision: 244691
>>>>>>>>>>>
>>>>>>>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=244691&view=rev
>>>>>>>>>>> Log:
>>>>>>>>>>> COFF: Align sections to 512-byte boundaries on disk.
>>>>>>>>>>>
>>>>>>>>>>> Sections must start at page boundaries in memory, but they
>>>>>>>>>>> can be aligned to sector boundaries (512-bytes) on disk.
>>>>>>>>>>> We aligned them to 4096-byte boundaries even on disk, so we
>>>>>>>>>>> wasted disk space a bit.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This will likely force the kernel to copy or otherwise do
>>>>>>>>>> unnecessary work when loading. Are you sure that isn't happening? The
>>>>>>>>>> kernel ideally wants to just create a couple page table entries. But if it
>>>>>>>>>> needs to move things around at <4K granularity to make them properly
>>>>>>>>>> aligned to their load address when loading (like this patch I think causes)
>>>>>>>>>> then it will need to do copies.
>>>>>>>>>>
>>>>>>>>>> This can likely be checked by looking for an increase in real
>>>>>>>>>> memory usage for the system when the new binaries are loaded (vs. the old
>>>>>>>>>> page-aligned ones), since the kernel will have a copy sitting in page cache
>>>>>>>>>> and a copy for alignment mapped into the process address space;
>>>>>>>>>> alternatively, you can check for the slowdown from the kernel copies when
>>>>>>>>>> faulting the memory into the process's address space (or (less likely) it
>>>>>>>>>> may do the copies eagerly which should be easy to measure too).
>>>>>>>>>>
>>>>>>>>>> -- Sean Silva
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Modified:
>>>>>>>>>>>     lld/trunk/COFF/Writer.cpp
>>>>>>>>>>>     lld/trunk/test/COFF/baserel.test
>>>>>>>>>>>     lld/trunk/test/COFF/hello32.test
>>>>>>>>>>>
>>>>>>>>>>> Modified: lld/trunk/COFF/Writer.cpp
>>>>>>>>>>> URL:
>>>>>>>>>>> http://llvm.org/viewvc/llvm-project/lld/trunk/COFF/Writer.cpp?rev=244691&r1=244690&r2=244691&view=diff
>>>>>>>>>>>
>>>>>>>>>>> ==============================================================================
>>>>>>>>>>> --- lld/trunk/COFF/Writer.cpp (original)
>>>>>>>>>>> +++ lld/trunk/COFF/Writer.cpp Tue Aug 11 18:09:00 2015
>>>>>>>>>>> @@ -37,8 +37,7 @@ using namespace lld;
>>>>>>>>>>>  using namespace lld::coff;
>>>>>>>>>>>
>>>>>>>>>>>  static const int PageSize = 4096;
>>>>>>>>>>> -static const int FileAlignment = 512;
>>>>>>>>>>> -static const int SectionAlignment = 4096;
>>>>>>>>>>> +static const int SectorSize = 512;
>>>>>>>>>>>  static const int DOSStubSize = 64;
>>>>>>>>>>>  static const int NumberfOfDataDirectory = 16;
>>>>>>>>>>>
>>>>>>>>>>> @@ -174,7 +173,7 @@ void OutputSection::addChunk(Chunk *C) {
>>>>>>>>>>>    Off += C->getSize();
>>>>>>>>>>>    Header.VirtualSize = Off;
>>>>>>>>>>>    if (C->hasData())
>>>>>>>>>>> -    Header.SizeOfRawData = RoundUpToAlignment(Off,
>>>>>>>>>>> FileAlignment);
>>>>>>>>>>> +    Header.SizeOfRawData = RoundUpToAlignment(Off, SectorSize);
>>>>>>>>>>>  }
>>>>>>>>>>>
>>>>>>>>>>>  void OutputSection::addPermissions(uint32_t C) {
>>>>>>>>>>> @@ -507,15 +506,14 @@ void Writer::createSymbolAndStringTable(
>>>>>>>>>>>    // We position the symbol table to be adjacent to the end of
>>>>>>>>>>> the last section.
>>>>>>>>>>>    uint64_t FileOff =
>>>>>>>>>>>        LastSection->getFileOff() +
>>>>>>>>>>> -      RoundUpToAlignment(LastSection->getRawSize(),
>>>>>>>>>>> FileAlignment);
>>>>>>>>>>> +      RoundUpToAlignment(LastSection->getRawSize(), SectorSize);
>>>>>>>>>>>    if (!OutputSymtab.empty()) {
>>>>>>>>>>>      PointerToSymbolTable = FileOff;
>>>>>>>>>>>      FileOff += OutputSymtab.size() * sizeof(coff_symbol16);
>>>>>>>>>>>    }
>>>>>>>>>>>    if (!Strtab.empty())
>>>>>>>>>>>      FileOff += Strtab.size() + 4;
>>>>>>>>>>> -  FileSize = SizeOfHeaders +
>>>>>>>>>>> -             RoundUpToAlignment(FileOff - SizeOfHeaders,
>>>>>>>>>>> FileAlignment);
>>>>>>>>>>> +  FileSize = RoundUpToAlignment(FileOff, SectorSize);
>>>>>>>>>>>  }
>>>>>>>>>>>
>>>>>>>>>>>  // Visits all sections to assign incremental, non-overlapping
>>>>>>>>>>> RVAs and
>>>>>>>>>>> @@ -526,9 +524,9 @@ void Writer::assignAddresses() {
>>>>>>>>>>>                    sizeof(coff_section) * OutputSections.size();
>>>>>>>>>>>    SizeOfHeaders +=
>>>>>>>>>>>        Config->is64() ? sizeof(pe32plus_header) :
>>>>>>>>>>> sizeof(pe32_header);
>>>>>>>>>>> -  SizeOfHeaders = RoundUpToAlignment(SizeOfHeaders, PageSize);
>>>>>>>>>>> +  SizeOfHeaders = RoundUpToAlignment(SizeOfHeaders, SectorSize);
>>>>>>>>>>>    uint64_t RVA = 0x1000; // The first page is kept unmapped.
>>>>>>>>>>> -  uint64_t FileOff = SizeOfHeaders;
>>>>>>>>>>> +  FileSize = SizeOfHeaders;
>>>>>>>>>>>    // Move DISCARDABLE (or non-memory-mapped) sections to the
>>>>>>>>>>> end of file because
>>>>>>>>>>>    // the loader cannot handle holes.
>>>>>>>>>>>    std::stable_partition(
>>>>>>>>>>> @@ -539,13 +537,11 @@ void Writer::assignAddresses() {
>>>>>>>>>>>      if (Sec->getName() == ".reloc")
>>>>>>>>>>>        addBaserels(Sec);
>>>>>>>>>>>      Sec->setRVA(RVA);
>>>>>>>>>>> -    Sec->setFileOffset(FileOff);
>>>>>>>>>>> +    Sec->setFileOffset(FileSize);
>>>>>>>>>>>      RVA += RoundUpToAlignment(Sec->getVirtualSize(), PageSize);
>>>>>>>>>>> -    FileOff += RoundUpToAlignment(Sec->getRawSize(),
>>>>>>>>>>> FileAlignment);
>>>>>>>>>>> +    FileSize += RoundUpToAlignment(Sec->getRawSize(),
>>>>>>>>>>> SectorSize);
>>>>>>>>>>>    }
>>>>>>>>>>>    SizeOfImage = SizeOfHeaders + RoundUpToAlignment(RVA -
>>>>>>>>>>> 0x1000, PageSize);
>>>>>>>>>>> -  FileSize = SizeOfHeaders +
>>>>>>>>>>> -             RoundUpToAlignment(FileOff - SizeOfHeaders,
>>>>>>>>>>> FileAlignment);
>>>>>>>>>>>  }
>>>>>>>>>>>
>>>>>>>>>>>  template <typename PEHeaderTy> void Writer::writeHeader() {
>>>>>>>>>>> @@ -584,8 +580,8 @@ template <typename PEHeaderTy> void Writ
>>>>>>>>>>>    Buf += sizeof(*PE);
>>>>>>>>>>>    PE->Magic = Config->is64() ? PE32Header::PE32_PLUS :
>>>>>>>>>>> PE32Header::PE32;
>>>>>>>>>>>    PE->ImageBase = Config->ImageBase;
>>>>>>>>>>> -  PE->SectionAlignment = SectionAlignment;
>>>>>>>>>>> -  PE->FileAlignment = FileAlignment;
>>>>>>>>>>> +  PE->SectionAlignment = PageSize;
>>>>>>>>>>> +  PE->FileAlignment = SectorSize;
>>>>>>>>>>>    PE->MajorImageVersion = Config->MajorImageVersion;
>>>>>>>>>>>    PE->MinorImageVersion = Config->MinorImageVersion;
>>>>>>>>>>>    PE->MajorOperatingSystemVersion = Config->MajorOSVersion;
>>>>>>>>>>>
>>>>>>>>>>> Modified: lld/trunk/test/COFF/baserel.test
>>>>>>>>>>> URL:
>>>>>>>>>>> http://llvm.org/viewvc/llvm-project/lld/trunk/test/COFF/baserel.test?rev=244691&r1=244690&r2=244691&view=diff
>>>>>>>>>>>
>>>>>>>>>>> ==============================================================================
>>>>>>>>>>> --- lld/trunk/test/COFF/baserel.test (original)
>>>>>>>>>>> +++ lld/trunk/test/COFF/baserel.test Tue Aug 11 18:09:00 2015
>>>>>>>>>>> @@ -61,7 +61,7 @@
>>>>>>>>>>>  # BASEREL-HEADER-NEXT: VirtualSize: 0x20
>>>>>>>>>>>  # BASEREL-HEADER-NEXT: VirtualAddress: 0x5000
>>>>>>>>>>>  # BASEREL-HEADER-NEXT: RawDataSize: 512
>>>>>>>>>>> -# BASEREL-HEADER-NEXT: PointerToRawData: 0x1800
>>>>>>>>>>> +# BASEREL-HEADER-NEXT: PointerToRawData: 0xC00
>>>>>>>>>>>  # BASEREL-HEADER-NEXT: PointerToRelocations: 0x0
>>>>>>>>>>>  # BASEREL-HEADER-NEXT: PointerToLineNumbers: 0x0
>>>>>>>>>>>  # BASEREL-HEADER-NEXT: RelocationCount: 0
>>>>>>>>>>>
>>>>>>>>>>> Modified: lld/trunk/test/COFF/hello32.test
>>>>>>>>>>> URL:
>>>>>>>>>>> http://llvm.org/viewvc/llvm-project/lld/trunk/test/COFF/hello32.test?rev=244691&r1=244690&r2=244691&view=diff
>>>>>>>>>>>
>>>>>>>>>>> ==============================================================================
>>>>>>>>>>> --- lld/trunk/test/COFF/hello32.test (original)
>>>>>>>>>>> +++ lld/trunk/test/COFF/hello32.test Tue Aug 11 18:09:00 2015
>>>>>>>>>>> @@ -38,8 +38,8 @@ HEADER-NEXT:   MajorImageVersion: 0
>>>>>>>>>>>  HEADER-NEXT:   MinorImageVersion: 0
>>>>>>>>>>>  HEADER-NEXT:   MajorSubsystemVersion: 6
>>>>>>>>>>>  HEADER-NEXT:   MinorSubsystemVersion: 0
>>>>>>>>>>> -HEADER-NEXT:   SizeOfImage: 20480
>>>>>>>>>>> -HEADER-NEXT:   SizeOfHeaders: 4096
>>>>>>>>>>> +HEADER-NEXT:   SizeOfImage: 16896
>>>>>>>>>>> +HEADER-NEXT:   SizeOfHeaders: 512
>>>>>>>>>>>  HEADER-NEXT:   Subsystem: IMAGE_SUBSYSTEM_WINDOWS_CUI (0x3)
>>>>>>>>>>>  HEADER-NEXT:   Characteristics [ (0x8140)
>>>>>>>>>>>  HEADER-NEXT:     IMAGE_DLL_CHARACTERISTICS_DYNAMIC_BASE (0x40)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> llvm-commits mailing list
>>>>>>>>>>> llvm-commits at lists.llvm.org
>>>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150910/3814ef32/attachment.html>


More information about the llvm-commits mailing list