[PATCH] D55585: RFC: [LLD][COFF] Parallel GHASH generation at link-time

Fri Jan 11 09:33:21 PST 2019

aganea marked an inline comment as done.
aganea added inline comments.
Herald added a subscriber: rupprecht.

================
Comment at: llvm/trunk/include/llvm/DebugInfo/CodeView/GlobalTypeDenseMap.h:342
+                                                     sys::Memory::MF_WRITE |
+                                                     llvm::sys::Memory::MF_HUGE,
+        EC);
----------------
rnk wrote:
> How much do huge pages matter relative to the custom hash table?
It's quite significant:
| without 2MB pages | Type Merging:            6588 ms ( 23.0%) |
| with 2MB pages | Type Merging:            4856 ms ( 19.3%) |
I only removed the flag `sys::Memory::MF_HUGE` for this test.

Here are some stats for the data used:
```
                                    Summary
--------------------------------------------------------------------------------
            156 Input OBJ files (expanded from all cmd-line inputs)
              0 Dependent PDB files
              1 Dependent PCH OBJ files
       81556098 Input type records (across all OBJ and dependencies)
     5108516032 Input type records bytes (across all OBJ and dependencies)
        4588516 Output merged type records
       10067321 Output merged symbol records
          23157 Output PDB strings
```
This is the perfect use-case for large pages: a large contiguous structure, used with random accesses, in a tight loop. The `GlobalTypeDenseMap` fits just perfectly with 2MB pages. In this precise testcase, the hashtable is 64MB (32x 2MB pages), which also happen to fit perfectly the max DTLB slots on [[ https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(server) | modern Intel CPUs ]].  If my understanding is correct, TLB slots for large pages come in addition to 4KB pages (at least for L1 DTLB).

I think I'll make this `sys::Memory::MF_HUGE` flag indicate a **hint **. On many OSes, you need to manually enable large pages (at least on W10 and Linux), so this might not be available by default. And even at that, on Windows at least, large pages are physical-only (not swappable). When specifying this flag, `Memory::allocateMappedMemory` should only "try" to use large pages, and fallback to regular (4KB) pages instead.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55585/new/

https://reviews.llvm.org/D55585