[lld] r244691 - COFF: Align sections to 512-byte boundaries on disk.

Sean Silva via llvm-commits llvm-commits at lists.llvm.org
Thu Aug 13 16:11:46 PDT 2015


On Thu, Aug 13, 2015 at 8:48 AM, Rui Ueyama <ruiu at google.com> wrote:

> Noticed that at least on Linux,  the file offset must be a multiple of the
> page size, or the kernel returns an error to the system call. I still don't
> think this has a negative impact on performance because this file structure
> is the same as MSVC linker (that aligns sections to 512 bytes boundaries on
> disk) though.
>
They may have a hack in their kernel that makes this not a problem. Still,
I would highly recommend measuring this to be sure.

For example, the kernel is *guaranteed* to have to make a copy after LLD
outputs an unaligned file, since LLD has the section mapped at a different
offset modulo the page size than the process will when it is executing (not
that this matters very much since LLD only outputs the file on the build
machine).

-- Sean Silva



> 2015/08/13 16:51 "Rui Ueyama" <ruiu at google.com>:
>
> I think I do understand how the paging mechanism works. :)  We are talking
>> about different things. My question is why you think a file offset must be
>> at a 4K boundary in order to map it efficiently to memory. To me you seems
>> to be claiming that mmap(0, /*length*/4096, PROT_READ|PROT_EXEC, 0, SomeFD,
>> /*file offset*/5120) is much inefficient than mmap(0, /*length*/4096,
>> PROT_READ|PROT_EXEC, 0, SomeFD, /*file offset*/4096) because of the file
>> offset of the former mmap call is not a multiple of 4096. And I'm saying
>> that that's not true.
>>
>> On Thu, Aug 13, 2015 at 4:10 PM, Sean Silva <chisophugis at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Wed, Aug 12, 2015 at 10:59 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>
>>>> What I don't understand is that why the offset from the beginning of
>>>> the file must be multiple of page size in order to avoid full copy. Windows
>>>> requires all sections to be aligned at least 4K in memory and 512 bytes on
>>>> file, and I don't see any problem there.
>>>>
>>>> Let's say we have two sections, A and B, whose sizes are 1024B and
>>>> 4096B, respectively. We also assume that A's offset from the beginning of
>>>> file is 4096, and B's 5120. The loader can map offset 4096 to 8192 of the
>>>> file to some page, and 5120 to 9216 to other page. Why can't that?
>>>>
>>>
>>> From the kernel's perspective of mapping memory (on x86), memory is
>>> divided into aligned 4K pieces. 5120 % 4096 == 1024, so in order to map it
>>> at an address that is 4K aligned, it must do a full memmove in order to
>>> move all the memory by 1024 bytes so that it is 4K aligned. This image
>>> maybe helps to understand how a 32-bit x86 CPU understands a virtual memory
>>> address:
>>> https://upload.wikimedia.org/wikipedia/commons/8/8e/X86_Paging_4K.svg
>>>
>>> IIRC the resources I learned from are:
>>> http://duartes.org/gustavo/blog/post/how-the-kernel-manages-your-memory/
>>> http://duartes.org/gustavo/blog/post/the-thing-king/
>>> (that web page has many other very, *very* good posts. A list can be
>>> seen at: http://duartes.org/gustavo/blog/category/internals/)
>>>
>>> I think you will find that understanding virtual memory (and TLB) will
>>> greatly help you optimize LLD, since many operations in LLD have very high
>>> pressure on the virtual memory system.
>>>
>>> -- Sean Silva
>>>
>>>
>>>>
>>>> On Thu, Aug 13, 2015 at 2:44 PM, Sean Silva <chisophugis at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Aug 12, 2015 at 7:17 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>>>
>>>>>> I didn't test to see if this change ever has a negative impact on
>>>>>> memory usage, but my guess is that's very unlikely because this layout is
>>>>>> the same as what MSVC linker creates. If this is inefficient, virtually all
>>>>>> Windows executables are being suffered by that, which is unlikely. My
>>>>>> understanding is that the kernel maps each section separately to a memory
>>>>>> address, so file offset of each section can be given independently from
>>>>>> other sections.
>>>>>>
>>>>>
>>>>> The mapping is done at the granularity of aligned 4K pages minimum
>>>>> (this is just how the x86 hardware page table mechanism works). A piece of
>>>>> the file cannot be moved by an amount that is not a multiple of 4K without
>>>>> a full copy.
>>>>>
>>>>> The only way this could (in the usual case) not have a large overhead
>>>>> is for the kernel to do a crazy hack like have special paging semantics for
>>>>> files that are executables. This means that when LLD finishes working on a
>>>>> memory mapped file, if a section is not 4K aligned at least, then the
>>>>> kernel has to then do a copy to make the file conform the the actual memory
>>>>> layout it needs to have in the paging subsystem.
>>>>>
>>>>> Or does windows make full copies of sections always? In other words
>>>>> processes don't share e.g. readonly text?
>>>>>
>>>>> In ELF, the offset in the file and the offset in memory are required
>>>>> to be congruent modulo the alignment (see the documentation of p_align in
>>>>> http://www.sco.com/developers/gabi/latest/ch5.pheader.html),
>>>>> precisely to avoid the need to do crazy hacks like this when loading the
>>>>> program.
>>>>>
>>>>> You can see that Linux will reject the binary:
>>>>> http://lxr.free-electrons.com/source/fs/binfmt_elf.c#L664
>>>>> (load_elf_binary)
>>>>> -> http://lxr.free-electrons.com/source/fs/binfmt_elf.c#L336 (elf_map)
>>>>> -> http://lxr.free-electrons.com/source/mm/util.c#L306 (vm_mmap)
>>>>> Notice:
>>>>> 312         if (unlikely(offset & ~PAGE_MASK))
>>>>> 313                 return -EINVAL;
>>>>>
>>>>> FreeBSD is more lenient, but you can see that the kernel does not like
>>>>> the situation when this is violated:
>>>>>
>>>>> http://src.illumos.org/source/xref/freebsd-head/sys/kern/imgact_elf.c#593
>>>>> (__elfN(load_file))
>>>>> -->
>>>>> http://src.illumos.org/source/xref/freebsd-head/sys/kern/imgact_elf.c#467
>>>>> (__elfN(load_section))
>>>>> -->
>>>>> http://src.illumos.org/source/xref/freebsd-head/sys/kern/imgact_elf.c#398
>>>>> (__elfN(map_insert))
>>>>> 423 /*
>>>>> 424 * The mapping is not page aligned. This means we have
>>>>> 425 * to copy the data. Sigh.
>>>>> 426 */
>>>>>
>>>>> -- Sean Silva
>>>>>
>>>>>
>>>>>>
>>>>>> On Wed, Aug 12, 2015 at 5:31 PM, Sean Silva <chisophugis at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 11, 2015 at 4:09 PM, Rui Ueyama via llvm-commits <
>>>>>>> llvm-commits at lists.llvm.org> wrote:
>>>>>>>
>>>>>>>> Author: ruiu
>>>>>>>> Date: Tue Aug 11 18:09:00 2015
>>>>>>>> New Revision: 244691
>>>>>>>>
>>>>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=244691&view=rev
>>>>>>>> Log:
>>>>>>>> COFF: Align sections to 512-byte boundaries on disk.
>>>>>>>>
>>>>>>>> Sections must start at page boundaries in memory, but they
>>>>>>>> can be aligned to sector boundaries (512-bytes) on disk.
>>>>>>>> We aligned them to 4096-byte boundaries even on disk, so we
>>>>>>>> wasted disk space a bit.
>>>>>>>>
>>>>>>>
>>>>>>> This will likely force the kernel to copy or otherwise do
>>>>>>> unnecessary work when loading. Are you sure that isn't happening? The
>>>>>>> kernel ideally wants to just create a couple page table entries. But if it
>>>>>>> needs to move things around at <4K granularity to make them properly
>>>>>>> aligned to their load address when loading (like this patch I think causes)
>>>>>>> then it will need to do copies.
>>>>>>>
>>>>>>> This can likely be checked by looking for an increase in real memory
>>>>>>> usage for the system when the new binaries are loaded (vs. the old
>>>>>>> page-aligned ones), since the kernel will have a copy sitting in page cache
>>>>>>> and a copy for alignment mapped into the process address space;
>>>>>>> alternatively, you can check for the slowdown from the kernel copies when
>>>>>>> faulting the memory into the process's address space (or (less likely) it
>>>>>>> may do the copies eagerly which should be easy to measure too).
>>>>>>>
>>>>>>> -- Sean Silva
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Modified:
>>>>>>>>     lld/trunk/COFF/Writer.cpp
>>>>>>>>     lld/trunk/test/COFF/baserel.test
>>>>>>>>     lld/trunk/test/COFF/hello32.test
>>>>>>>>
>>>>>>>> Modified: lld/trunk/COFF/Writer.cpp
>>>>>>>> URL:
>>>>>>>> http://llvm.org/viewvc/llvm-project/lld/trunk/COFF/Writer.cpp?rev=244691&r1=244690&r2=244691&view=diff
>>>>>>>>
>>>>>>>> ==============================================================================
>>>>>>>> --- lld/trunk/COFF/Writer.cpp (original)
>>>>>>>> +++ lld/trunk/COFF/Writer.cpp Tue Aug 11 18:09:00 2015
>>>>>>>> @@ -37,8 +37,7 @@ using namespace lld;
>>>>>>>>  using namespace lld::coff;
>>>>>>>>
>>>>>>>>  static const int PageSize = 4096;
>>>>>>>> -static const int FileAlignment = 512;
>>>>>>>> -static const int SectionAlignment = 4096;
>>>>>>>> +static const int SectorSize = 512;
>>>>>>>>  static const int DOSStubSize = 64;
>>>>>>>>  static const int NumberfOfDataDirectory = 16;
>>>>>>>>
>>>>>>>> @@ -174,7 +173,7 @@ void OutputSection::addChunk(Chunk *C) {
>>>>>>>>    Off += C->getSize();
>>>>>>>>    Header.VirtualSize = Off;
>>>>>>>>    if (C->hasData())
>>>>>>>> -    Header.SizeOfRawData = RoundUpToAlignment(Off, FileAlignment);
>>>>>>>> +    Header.SizeOfRawData = RoundUpToAlignment(Off, SectorSize);
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  void OutputSection::addPermissions(uint32_t C) {
>>>>>>>> @@ -507,15 +506,14 @@ void Writer::createSymbolAndStringTable(
>>>>>>>>    // We position the symbol table to be adjacent to the end of the
>>>>>>>> last section.
>>>>>>>>    uint64_t FileOff =
>>>>>>>>        LastSection->getFileOff() +
>>>>>>>> -      RoundUpToAlignment(LastSection->getRawSize(), FileAlignment);
>>>>>>>> +      RoundUpToAlignment(LastSection->getRawSize(), SectorSize);
>>>>>>>>    if (!OutputSymtab.empty()) {
>>>>>>>>      PointerToSymbolTable = FileOff;
>>>>>>>>      FileOff += OutputSymtab.size() * sizeof(coff_symbol16);
>>>>>>>>    }
>>>>>>>>    if (!Strtab.empty())
>>>>>>>>      FileOff += Strtab.size() + 4;
>>>>>>>> -  FileSize = SizeOfHeaders +
>>>>>>>> -             RoundUpToAlignment(FileOff - SizeOfHeaders,
>>>>>>>> FileAlignment);
>>>>>>>> +  FileSize = RoundUpToAlignment(FileOff, SectorSize);
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  // Visits all sections to assign incremental, non-overlapping RVAs
>>>>>>>> and
>>>>>>>> @@ -526,9 +524,9 @@ void Writer::assignAddresses() {
>>>>>>>>                    sizeof(coff_section) * OutputSections.size();
>>>>>>>>    SizeOfHeaders +=
>>>>>>>>        Config->is64() ? sizeof(pe32plus_header) :
>>>>>>>> sizeof(pe32_header);
>>>>>>>> -  SizeOfHeaders = RoundUpToAlignment(SizeOfHeaders, PageSize);
>>>>>>>> +  SizeOfHeaders = RoundUpToAlignment(SizeOfHeaders, SectorSize);
>>>>>>>>    uint64_t RVA = 0x1000; // The first page is kept unmapped.
>>>>>>>> -  uint64_t FileOff = SizeOfHeaders;
>>>>>>>> +  FileSize = SizeOfHeaders;
>>>>>>>>    // Move DISCARDABLE (or non-memory-mapped) sections to the end
>>>>>>>> of file because
>>>>>>>>    // the loader cannot handle holes.
>>>>>>>>    std::stable_partition(
>>>>>>>> @@ -539,13 +537,11 @@ void Writer::assignAddresses() {
>>>>>>>>      if (Sec->getName() == ".reloc")
>>>>>>>>        addBaserels(Sec);
>>>>>>>>      Sec->setRVA(RVA);
>>>>>>>> -    Sec->setFileOffset(FileOff);
>>>>>>>> +    Sec->setFileOffset(FileSize);
>>>>>>>>      RVA += RoundUpToAlignment(Sec->getVirtualSize(), PageSize);
>>>>>>>> -    FileOff += RoundUpToAlignment(Sec->getRawSize(),
>>>>>>>> FileAlignment);
>>>>>>>> +    FileSize += RoundUpToAlignment(Sec->getRawSize(), SectorSize);
>>>>>>>>    }
>>>>>>>>    SizeOfImage = SizeOfHeaders + RoundUpToAlignment(RVA - 0x1000,
>>>>>>>> PageSize);
>>>>>>>> -  FileSize = SizeOfHeaders +
>>>>>>>> -             RoundUpToAlignment(FileOff - SizeOfHeaders,
>>>>>>>> FileAlignment);
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  template <typename PEHeaderTy> void Writer::writeHeader() {
>>>>>>>> @@ -584,8 +580,8 @@ template <typename PEHeaderTy> void Writ
>>>>>>>>    Buf += sizeof(*PE);
>>>>>>>>    PE->Magic = Config->is64() ? PE32Header::PE32_PLUS :
>>>>>>>> PE32Header::PE32;
>>>>>>>>    PE->ImageBase = Config->ImageBase;
>>>>>>>> -  PE->SectionAlignment = SectionAlignment;
>>>>>>>> -  PE->FileAlignment = FileAlignment;
>>>>>>>> +  PE->SectionAlignment = PageSize;
>>>>>>>> +  PE->FileAlignment = SectorSize;
>>>>>>>>    PE->MajorImageVersion = Config->MajorImageVersion;
>>>>>>>>    PE->MinorImageVersion = Config->MinorImageVersion;
>>>>>>>>    PE->MajorOperatingSystemVersion = Config->MajorOSVersion;
>>>>>>>>
>>>>>>>> Modified: lld/trunk/test/COFF/baserel.test
>>>>>>>> URL:
>>>>>>>> http://llvm.org/viewvc/llvm-project/lld/trunk/test/COFF/baserel.test?rev=244691&r1=244690&r2=244691&view=diff
>>>>>>>>
>>>>>>>> ==============================================================================
>>>>>>>> --- lld/trunk/test/COFF/baserel.test (original)
>>>>>>>> +++ lld/trunk/test/COFF/baserel.test Tue Aug 11 18:09:00 2015
>>>>>>>> @@ -61,7 +61,7 @@
>>>>>>>>  # BASEREL-HEADER-NEXT: VirtualSize: 0x20
>>>>>>>>  # BASEREL-HEADER-NEXT: VirtualAddress: 0x5000
>>>>>>>>  # BASEREL-HEADER-NEXT: RawDataSize: 512
>>>>>>>> -# BASEREL-HEADER-NEXT: PointerToRawData: 0x1800
>>>>>>>> +# BASEREL-HEADER-NEXT: PointerToRawData: 0xC00
>>>>>>>>  # BASEREL-HEADER-NEXT: PointerToRelocations: 0x0
>>>>>>>>  # BASEREL-HEADER-NEXT: PointerToLineNumbers: 0x0
>>>>>>>>  # BASEREL-HEADER-NEXT: RelocationCount: 0
>>>>>>>>
>>>>>>>> Modified: lld/trunk/test/COFF/hello32.test
>>>>>>>> URL:
>>>>>>>> http://llvm.org/viewvc/llvm-project/lld/trunk/test/COFF/hello32.test?rev=244691&r1=244690&r2=244691&view=diff
>>>>>>>>
>>>>>>>> ==============================================================================
>>>>>>>> --- lld/trunk/test/COFF/hello32.test (original)
>>>>>>>> +++ lld/trunk/test/COFF/hello32.test Tue Aug 11 18:09:00 2015
>>>>>>>> @@ -38,8 +38,8 @@ HEADER-NEXT:   MajorImageVersion: 0
>>>>>>>>  HEADER-NEXT:   MinorImageVersion: 0
>>>>>>>>  HEADER-NEXT:   MajorSubsystemVersion: 6
>>>>>>>>  HEADER-NEXT:   MinorSubsystemVersion: 0
>>>>>>>> -HEADER-NEXT:   SizeOfImage: 20480
>>>>>>>> -HEADER-NEXT:   SizeOfHeaders: 4096
>>>>>>>> +HEADER-NEXT:   SizeOfImage: 16896
>>>>>>>> +HEADER-NEXT:   SizeOfHeaders: 512
>>>>>>>>  HEADER-NEXT:   Subsystem: IMAGE_SUBSYSTEM_WINDOWS_CUI (0x3)
>>>>>>>>  HEADER-NEXT:   Characteristics [ (0x8140)
>>>>>>>>  HEADER-NEXT:     IMAGE_DLL_CHARACTERISTICS_DYNAMIC_BASE (0x40)
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> llvm-commits mailing list
>>>>>>>> llvm-commits at lists.llvm.org
>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150813/7e548541/attachment.html>


More information about the llvm-commits mailing list