[cfe-dev] [RFC] Moving (parts of) the Cling REPL in Clang

Thu Jul 9 19:25:21 PDT 2020

I think that it would be great to have infrastructure for incremental 
C++ compilation, supporting interactive use, just-in-time compilation, 
and so on. I think that the best way to deal with the patches, etc., as 
well as IncrementalAction, is to first send an RFC explaining the 
overall design.

  -Hal

On 7/9/20 3:46 PM, Vassil Vassilev via cfe-dev wrote:
> Motivation
> ===
>
> Over the last decade we have developed an interactive, interpretative 
> C++ (aka REPL) as part of the high-energy physics (HEP) data analysis 
> project -- ROOT [1-2]. We invested a significant  effort to replace 
> the CINT C++ interpreter with a newly implemented REPL based on llvm 
> -- cling [3]. The cling infrastructure is a core component of the data 
> analysis framework of ROOT and runs in production for approximately 5 
> years.
>
> Cling is also  a standalone tool, which has a growing community 
> outside of our field. Cling’s user community includes users in 
> finance, biology and in a few companies with proprietary software. For 
> example, there is a xeus-cling jupyter kernel [4]. One of the major 
> challenges we face to foster that community is  our cling-related 
> patches in llvm and clang forks. The benefits of using the LLVM 
> community standards for code reviews, release cycles and integration 
> has been mentioned a number of times by our "external" users.
>
> Last year we were awarded an NSF grant to improve cling's 
> sustainability and make it a standalone tool. We thank the LLVM 
> Foundation Board for supporting us with a non-binding letter of 
> collaboration which was essential for getting this grant.
>
>
> Background
> ===
>
> Cling is a C++ interpreter built on top of clang and llvm. In a 
> nutshell, it uses clang's incremental compilation facilities to 
> process code chunk-by-chunk by assuming an ever-growing translation 
> unit [5]. Then code is lowered into llvm IR and run by the llvm jit. 
> Cling has implemented some language "extensions" such as execution 
> statements on the global scope and error recovery. Cling is in the 
> core of HEP -- it is heavily used during data analysis of exabytes of 
> particle physics data coming from the Large Hadron Collider (LHC) and 
> other particle physics experiments.
>
>
> Plans
> ===
>
> The project foresees three main directions -- move parts of cling 
> upstream along with the clang and llvm features that enable them; 
> extend and generalize the language interoperability layer around 
> cling; and extend and generalize the OpenCL/CUDA support in cling. We 
> are at the early stages of the project and this email intends to be an 
> RFC for the first part -- upstreaming parts of cling. Please do share 
> your thoughts on the rest, too.
>
>
> Moving Parts of Cling Upstream
> ---
>
> Over the years we have slowly moved some patches upstream. However we 
> still have around 100 patches in the clang fork. Most of them are in 
> the context of extending the incremental compilation support for 
> clang. The incremental compilation poses some challenges in the clang 
> infrastructure. For example, we need to tune CodeGen to work with 
> multiple llvm::Module instances, and finalize per each 
> end-of-translation unit (we have multiple of them). Other changes 
> include small adjustments in the FileManager's caching mechanism, and 
> bug fixes in the SourceManager (code which can be reached mostly from 
> within our setup). One conclusion we can draw from our research is 
> that the clang infrastructure fits amazingly well to something which 
> was not its main use case. The grand total of our diffs against 
> clang-9 is: `62 files changed, 1294 insertions(+), 231 deletions(-)`. 
> Cling is currently being upgraded from llvm-5 to llvm-9.
>
> A major weakness of cling's infrastructure is that it does not work 
> with the clang Action infrastructure due to the lack of an 
> IncrementalAction.  A possible way forward would be to implement a 
> clang::IncrementalAction as a starting point. This way we should be 
> able to reduce the amount of setup necessary to use the incremental 
> infrastructure in clang. However, this will be a bit of a testing 
> challenge -- cling lives downstream and some of the new code may be 
> impossible to pick straight away and use. Building a mainline example 
> tool such as clang-repl which gives us a way to test that incremental 
> case or repurpose the already existing clang-interpreter may  be able 
> to address the issue. The major risk of the task is avoiding code in 
> the clang mainline which is untested by its HEP production environment.
> There are several other types of patches to the ROOT fork of Clang, 
> including ones  in the context of performance,towards  C++ modules 
> support (D41416), and storage (does not have a patch yet but has an 
> open projects entry and somebody working on it). These patches can be 
> considered in parallel independently on the rest.
>
> Extend and Generalize the Language Interoperability Layer Around Cling
> ---
>
> HEP has extensive experience with on-demand python interoperability 
> using cppyy[6], which is built around the type information provided by 
> cling. Unlike tools with custom parsers such as swig and sip and tools 
> built on top of C-APIs such as boost.python and pybind11, cling can 
> provide information about memory management patterns (eg refcounting) 
> and instantiate templates on the fly.We feel that functionality may 
> not be of general interest to the llvm community but we will prepare 
> another RFC and send it here later on to gather feedback.
>
>
> Extend and Generalize the OpenCL/CUDA Support in Cling
> ---
>
> Cling can incrementally compile CUDA code [7-8] allowing easier set up 
> and enabling some interesting use cases. There are a number of planned 
> improvements including talking to HIP [9] and SYCL to support more 
> hardware architectures.
>
>
>
> The primary focus of our work is to upstreaming functionality required 
> to build an incremental compiler and rework cling build against 
> vanilla clang and llvm. The last two points are to give the scope of 
> the work which we will be doing the next 2-3 years. We will send here 
> RFCs for both of them to trigger technical discussion if there is 
> interest in pursuing this direction.
>
>
> Collaboration
> ===
>
> Open source development nowadays relies on reviewers. LLVM is no 
> different and we will probably disturb a good number of people in the 
> community ;)We would like to invite anybody interested in joining our 
> incremental C++ activities to our open every second week calls. 
> Announcements will be done via google group: 
> compiler-research-announce 
> (https://groups.google.com/g/compiler-research-announce).
>
>
>
> Many thanks!
>
>
> David & Vassil
>
> References
> ===
> [1] ROOT GitHub https://github.com/root-project/root
> [2] ROOT https://root.cern
> [3] Cling https://github.com/root-project/cling
> [4] Xeus-Cling 
> https://blog.jupyter.org/xeus-is-now-a-jupyter-subproject-c4ec5a1bf30b
> [5] Cling – The New Interactive Interpreter for ROOT 6, 
> https://iopscience.iop.org/article/10.1088/1742-6596/396/5/052071
> [6] High-performance Python-C++ bindings with PyPy and Cling, 
> https://dl.acm.org/doi/10.5555/3019083.3019087
> [7] 
> https://indico.cern.ch/event/697389/contributions/3085538/attachments/1712698/2761717/2018_09_10_cling_CUDA.pdf
> [8] CUDA C++ in Jupyter: Adding CUDA Runtime Support to Cling', 
> https://zenodo.org/record/3713753#.Xu8jqvJRXxU
> [9] HIP Programming Guide 
> https://rocmdocs.amd.com/en/latest/Programming_Guides/HIP-GUIDE.html
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory