[llvm] [llvm][Support][Memory] Add memfd based fallback for strict W^X Linux systems (PR #98538)

Sat Sep 28 23:41:55 PDT 2024

lhames wrote:

> > > What actually _requires_ implementing the fallback to memfds at this level is, there are quite a few "raw" calls to `sys::Memory::protectMappedMemory()`, e.g. in `orc::LocalTrampolinePool`, `orc::LocalIndirectStubsInfo`, `orc:: InProcessMemoryMapper`, `jitlink::InProcessMemoryManager::IPInFlightAlloc`, `orc::rt_bootstrap::SimpleExecutorMemoryManager`, `llvm-rtdyld` and the unit tests.
> > > So, unfortunately, implementing a new mapper or memory manager won't help for these. That's why I implemented the handling directly at the lowest level that needs it.
> > 
> > 
> > They'll take a little work to adapt, but I believe we can fix these to use the `SimpleSegmentAlloc` utility, which uses a `JITLinkMemoryManager` for the underlying implementation.
> 
> Hmm, feels a little odd to switch classes like `jitlink::InProcessMemoryManager::IPInFlightAlloc` over to use yet another `JITLinkMemoryManager` to implement... a `JITLinkMemoryManager`.

I'm suggesting use of a `SimpleSegmentAlloc` for users who need executable code and are currently using `sys::Memory` directly.

Support for the memfd approach should be restricted to a `JITLinkMemoryManager` or `Mapper`.

> Going through the code base and getting rid of `sys::Memory::protectMappedMemory()` calls also feels like an orthogonal issue, as support for a strict W^X policy can very well be implemented at the `sys::Memory` layer without having to switch all users over to a memory manager based scheme. I'd rather defer that work to a later stage, as it's not strictly required to solve the W^X support issue and can be done one user at a time which should not only ease review but also prevents this PR becoming just one big patch with lots of loosely related changes.
> 
> > > > The JIT linker only ever reads / writes memory, so the idea would be to create dual mappings -- one with `RW-` protections that can be used as the working memory for the linker, and a second with the requested permissions (which would simply have to be compliant with the system's policies) for use in JIT'd code.
> > > 
> > > 
> > > I explicitly tried to avoid the dual-mapping approach as it causes way too much trouble for no real gain. Having two mappings for the same data implies having different addresses for the writable and executable parts, which, in turn, makes generating proper references for generated code more complicated than it needs to be (relative addresses used in the writable mapping will refer to a different memory address when executed from the other mapping, likely causing access faults).
> > 
> > 
> > ORC already handles this: it's required to support out-of-process JITing. Dual-mapping in the same process shouldn't require any additional effort.
> 
> There are several other issues with dual mappings. The probably obvious one is that one has to keep them in sync (what gets written to one mapping needs to be read back when executed from the other). The easiest way to achieve this is to created `MAP_SHARED` file-based mappings. The file can also be a memfd which simply gets mapped twice. However, using `MAP_SHARED` has the unfortunate side-effect that these mappings will not only be kept in-sync for the creating process but also for forked child processes which is actually bad as a child can tamper with mappings in the parent (`MFD_CLOEXEC` for the memfd won't help, as the mapping has a reference to the underlying file that won't be invalidated).

What do you mean by "tamper with mappings in the parent"? If you just mean the writes to shared memory in the child will be visible in the parent process I would consider that desirable behavior. The existing SharedMemoryMapper already works this way.

> One may try to work around that by calling `madvise(MADV_DONTFORK)` on the mappings, making them disappear in the child. However, that would be bad as well as the code would vanish, possibly breaking existing use-cases.
> 
> Another reason to avoid dual-mappings is that with the writable mapping one can always tamper with the code, i.e. inject new code at arbitrary times. And, in fact, there is no need to do that, as current behaviour clearly shows. There's only a need to have a `RW-` → `R-X` transition -- no need to modify once generated and _finalized_ code later on. That simply vanishes the need to have two mappings as only one is used at a given time, allowing to simply replace the writable mapping with an executable one _in place_.

Being able to rewrite code opens up the possibility of improving performance. E.g. rewriting jumps to bypass stubs and go directly to function bodies once the function has been complied. 

ORC's memory manager interface was designed to be flexible enough to accommodate all of these options. Keeping dual RW- / R-X mappings as an option would be a really good outcome here.

> Using a dual-mapping approach also requires more syscalls, slowing down JIT code generation further. As the mapping has to be file-based, the size needs to be set in advance (unlike in my current version where it's implicitly updated by simply calling `write()` on the memfd).

I'm suggesting that it should be reasonably easy to make the dual-mapping optional, and that some clients will be willing to pay that cost for the combination of security and runtime flexibility that it offers.

> A dual-mapping scheme therefore provides no benefits over the replacement-mapping approach, which would even better mirror the _visible semantics_ of the current `mprotect()` based implementation -- simply change the protection bits for a given virtual address range.

I'd say that the current semantics of Memory aren't especially helpful, given that LLVM's JIT aims to work transparently out-of-process: The working assumption is that the address range being written isn't the same as the one being executed. That's why implementing this as a mapper makes it so much more useful.

> I therefore would vote against a dual-mapping approach and keep the replacement-mapping scheme, reusing the same VA range.

It's possible that there's something that I'm missing, but so far I haven't heard a compelling reason not to do this with a `JITLinkMemoryManager` or `Mapper`, and there are several benefits to doing it at that level.

https://github.com/llvm/llvm-project/pull/98538