[cfe-dev] Sharing MemoryBuffers between front ends and LLVM

Thu Mar 16 10:29:02 PDT 2017

Hi all,

I'm implementing interleaved source in assembly output. Early reviews raised the concern that the current implementation will be opening files (using a llvm::MemoryBuffer) that are likely to be in the memory of the front end (commonly clang but I think we want this to be front end agnostic). 

I'm now exploring ideas to avoid reopening files and let LLVM reuse the files the FE had to open.

I am assuming that the front end will use llvm::MemoryBuffer (e.g.: clang does indirectly through clang::SourceManager).

So for buffers related to named files (including stdin, which does not have name and is handled in a special way) we could have in the LLVM context a MemoryBufferRegistry. The idea is to add new creators of MemoryBuffer (the ones that work on named files and stdin) that can be passed a reference to that llvm::MemoryBufferRegistry. MemoryBuffer objects would register/deregister themselves at creation/destruction. This registry can then be used as a cache of already opened files from which retrieve a reference to the MemoryBuffer itself using the file path. These new interfaces would be opt-in for all users of MemoryBuffer.

Back to my case,  the new AsmPrinterHandler could now use the MemoryBufferRegistry of the LLVM context. If there is none or the memory buffer associated to a file path has been already deregistered (or was never registered e.g. because we are using a .ll file directly), it would open the file as usual, otherwise it would reuse the registered MemoryBuffer.

I see a few downsides of this approach, though.

It overlaps a bit with the existing SourceManager in clang which already does some caching work through the clang::ContentCache class. At first the cache seems hard to abstract away as it uses clang::FileEntry and looks pretty tailored for clang needs.

Also, assuming that the front end is using a MemoryBuffer may be a too strong requirement, in particular for FE's that are mostly unaware of LLVM except for a final LLVM codegen pass. This would mean that the files would be reopened even if they are already in the memory of the FE.

Finally the file path may not even be a good identifier to reuse MemoryBuffer objects.

Thoughts?

Thank you,
Roger