[llvm] [llvm][Support][Memory] Add memfd based fallback for strict W^X Linux systems (PR #98538)

Sat Sep 14 02:05:16 PDT 2024

minipli-oss wrote:

> > What actually _requires_ implementing the fallback to memfds at this level is, there are quite a few "raw" calls to `sys::Memory::protectMappedMemory()`, e.g. in `orc::LocalTrampolinePool`, `orc::LocalIndirectStubsInfo`, `orc:: InProcessMemoryMapper`, `jitlink::InProcessMemoryManager::IPInFlightAlloc`, `orc::rt_bootstrap::SimpleExecutorMemoryManager`, `llvm-rtdyld` and the unit tests.
> > So, unfortunately, implementing a new mapper or memory manager won't help for these. That's why I implemented the handling directly at the lowest level that needs it.
> 
> They'll take a little work to adapt, but I believe we can fix these to use the `SimpleSegmentAlloc` utility, which uses a `JITLinkMemoryManager` for the underlying implementation.

Hmm, feels a little odd to switch classes like `jitlink::InProcessMemoryManager::IPInFlightAlloc` over to use yet another `JITLinkMemoryManager` to implement... a `JITLinkMemoryManager`.

Going through the code base and getting rid of `sys::Memory::protectMappedMemory()` calls also feels like an orthogonal issue, as support for a strict W^X policy can very well be implemented at the `sys::Memory` layer without having to switch all users over to a memory manager based scheme. I'd rather defer that work to a later stage, as it's not strictly required to solve the W^X support issue and can be done one user at a time which should not only ease review but also prevents this PR becoming just one big patch with lots of loosely related changes.

> > > The JIT linker only ever reads / writes memory, so the idea would be to create dual mappings -- one with `RW-` protections that can be used as the working memory for the linker, and a second with the requested permissions (which would simply have to be compliant with the system's policies) for use in JIT'd code.
> > 
> > 
> > I explicitly tried to avoid the dual-mapping approach as it causes way too much trouble for no real gain. Having two mappings for the same data implies having different addresses for the writable and executable parts, which, in turn, makes generating proper references for generated code more complicated than it needs to be (relative addresses used in the writable mapping will refer to a different memory address when executed from the other mapping, likely causing access faults).
> 
> ORC already handles this: it's required to support out-of-process JITing. Dual-mapping in the same process shouldn't require any additional effort.

There are several other issues with dual mappings. The probably obvious one is that one has to keep them in sync (what gets written to one mapping needs to be read back when executed from the other). The easiest way to achieve this is to created `MAP_SHARED` file-based mappings. The file can also be a memfd which simply gets mapped twice. However, using `MAP_SHARED` has the unfortunate side-effect that these mappings will not only be kept in-sync for the creating process but also for forked child processes which is actually bad as a child can tamper with mappings in the parent (`MFD_CLOEXEC` for the memfd won't help, as the mapping has a reference to the underlying file that won't be invalidated). One may try to work around that by calling `madvise(MADV_DONTFORK)` on the mappings, making them disappear in the child. However, that would be bad as well as the code would vanish, possibly breaking existing use-cases.

Another reason to avoid dual-mappings is that with the writable mapping one can always tamper with the code, i.e. inject new code at arbitrary times. And, in fact, there is no need to do that, as current behaviour clearly shows. There's only a need to have a `RW-` → `R-X` transition -- no need to modify once generated and _finalized_ code later on. That simply vanishes the need to have two mappings as only one is used at a given time, allowing to simply replace the writable mapping with an executable one _in place_.

Using a dual-mapping approach also requires more syscalls, slowing down JIT code generation further. As the mapping has to be file-based, the size needs to be set in advance (unlike in my current version where it's implicitly updated by simply calling `write()` on the memfd).

A dual-mapping scheme therefore provides no benefits over the replacement-mapping approach, which would even better mirror the _visible semantics_ of the current `mprotect()` based implementation -- simply change the protection bits for a given virtual address range.

I therefore would vote against a dual-mapping approach and keep the replacement-mapping scheme, reusing the same VA range.

https://github.com/llvm/llvm-project/pull/98538