[cfe-dev] [RFC] Moving (parts of) the Cling REPL in Clang

Fri Jul 10 13:55:44 PDT 2020

On 7/10/20 1:57 PM, Vassil Vassilev wrote:
> On 7/10/20 6:43 AM, JF Bastien wrote:
>> I like cling, and having it integrated with the rest of the project 
>> would be neat. I agree with Hal’s suggestion to explain the design of 
>> what remains. It sounds like a pretty small amount of code.
>
>
>   JF, Hal, did you mean you want a design document of how cling in 
> general or a design RFC for the patches we have? A design document for 
> cling would be quite large and will take us some time to write up. 
> OTOH, we could relatively easily give a rationale for each patch.

I had in mind something that's probably in between. Something that 
explains the patches and enough about how they fit into a larger system 
that we can reason about the context.

  -Hal

>
>
>>
>>
>>> On Jul 9, 2020, at 7:25 PM, Hal Finkel via cfe-dev 
>>> <cfe-dev at lists.llvm.org> wrote:
>>>
>>> I think that it would be great to have infrastructure for 
>>> incremental C++ compilation, supporting interactive use, 
>>> just-in-time compilation, and so on. I think that the best way to 
>>> deal with the patches, etc., as well as IncrementalAction, is to 
>>> first send an RFC explaining the overall design.
>>>
>>>   -Hal
>>>
>>> On 7/9/20 3:46 PM, Vassil Vassilev via cfe-dev wrote:
>>>> Motivation
>>>> ===
>>>>
>>>> Over the last decade we have developed an interactive, 
>>>> interpretative C++ (aka REPL) as part of the high-energy physics 
>>>> (HEP) data analysis project -- ROOT [1-2]. We invested a 
>>>> significant  effort to replace the CINT C++ interpreter with a 
>>>> newly implemented REPL based on llvm -- cling [3]. The cling 
>>>> infrastructure is a core component of the data analysis framework 
>>>> of ROOT and runs in production for approximately 5 years.
>>>>
>>>> Cling is also  a standalone tool, which has a growing community 
>>>> outside of our field. Cling’s user community includes users in 
>>>> finance, biology and in a few companies with proprietary software. 
>>>> For example, there is a xeus-cling jupyter kernel [4]. One of the 
>>>> major challenges we face to foster that community is  our 
>>>> cling-related patches in llvm and clang forks. The benefits of 
>>>> using the LLVM community standards for code reviews, release cycles 
>>>> and integration has been mentioned a number of times by our 
>>>> "external" users.
>>>>
>>>> Last year we were awarded an NSF grant to improve cling's 
>>>> sustainability and make it a standalone tool. We thank the LLVM 
>>>> Foundation Board for supporting us with a non-binding letter of 
>>>> collaboration which was essential for getting this grant.
>>>>
>>>>
>>>> Background
>>>> ===
>>>>
>>>> Cling is a C++ interpreter built on top of clang and llvm. In a 
>>>> nutshell, it uses clang's incremental compilation facilities to 
>>>> process code chunk-by-chunk by assuming an ever-growing translation 
>>>> unit [5]. Then code is lowered into llvm IR and run by the llvm 
>>>> jit. Cling has implemented some language "extensions" such as 
>>>> execution statements on the global scope and error recovery. Cling 
>>>> is in the core of HEP -- it is heavily used during data analysis of 
>>>> exabytes of particle physics data coming from the Large Hadron 
>>>> Collider (LHC) and other particle physics experiments.
>>>>
>>>>
>>>> Plans
>>>> ===
>>>>
>>>> The project foresees three main directions -- move parts of cling 
>>>> upstream along with the clang and llvm features that enable them; 
>>>> extend and generalize the language interoperability layer around 
>>>> cling; and extend and generalize the OpenCL/CUDA support in cling. 
>>>> We are at the early stages of the project and this email intends to 
>>>> be an RFC for the first part -- upstreaming parts of cling. Please 
>>>> do share your thoughts on the rest, too.
>>>>
>>>>
>>>> Moving Parts of Cling Upstream
>>>> ---
>>>>
>>>> Over the years we have slowly moved some patches upstream. However 
>>>> we still have around 100 patches in the clang fork. Most of them 
>>>> are in the context of extending the incremental compilation support 
>>>> for clang. The incremental compilation poses some challenges in the 
>>>> clang infrastructure. For example, we need to tune CodeGen to work 
>>>> with multiple llvm::Module instances, and finalize per each 
>>>> end-of-translation unit (we have multiple of them). Other changes 
>>>> include small adjustments in the FileManager's caching mechanism, 
>>>> and bug fixes in the SourceManager (code which can be reached 
>>>> mostly from within our setup). One conclusion we can draw from our 
>>>> research is that the clang infrastructure fits amazingly well to 
>>>> something which was not its main use case. The grand total of our 
>>>> diffs against clang-9 is: `62 files changed, 1294 insertions(+), 
>>>> 231 deletions(-)`. Cling is currently being upgraded from llvm-5 to 
>>>> llvm-9.
>>>>
>>>> A major weakness of cling's infrastructure is that it does not work 
>>>> with the clang Action infrastructure due to the lack of an 
>>>> IncrementalAction.  A possible way forward would be to implement a 
>>>> clang::IncrementalAction as a starting point. This way we should be 
>>>> able to reduce the amount of setup necessary to use the incremental 
>>>> infrastructure in clang. However, this will be a bit of a testing 
>>>> challenge -- cling lives downstream and some of the new code may be 
>>>> impossible to pick straight away and use. Building a mainline 
>>>> example tool such as clang-repl which gives us a way to test that 
>>>> incremental case or repurpose the already existing 
>>>> clang-interpreter may  be able to address the issue. The major risk 
>>>> of the task is avoiding code in the clang mainline which is 
>>>> untested by its HEP production environment.
>>>> There are several other types of patches to the ROOT fork of Clang, 
>>>> including ones  in the context of performance,towards  C++ modules 
>>>> support (D41416), and storage (does not have a patch yet but has an 
>>>> open projects entry and somebody working on it). These patches can 
>>>> be considered in parallel independently on the rest.
>>>>
>>>> Extend and Generalize the Language Interoperability Layer Around Cling
>>>> ---
>>>>
>>>> HEP has extensive experience with on-demand python interoperability 
>>>> using cppyy[6], which is built around the type information provided 
>>>> by cling. Unlike tools with custom parsers such as swig and sip and 
>>>> tools built on top of C-APIs such as boost.python and pybind11, 
>>>> cling can provide information about memory management patterns (eg 
>>>> refcounting) and instantiate templates on the fly.We feel that 
>>>> functionality may not be of general interest to the llvm community 
>>>> but we will prepare another RFC and send it here later on to gather 
>>>> feedback.
>>>>
>>>>
>>>> Extend and Generalize the OpenCL/CUDA Support in Cling
>>>> ---
>>>>
>>>> Cling can incrementally compile CUDA code [7-8] allowing easier set 
>>>> up and enabling some interesting use cases. There are a number of 
>>>> planned improvements including talking to HIP [9] and SYCL to 
>>>> support more hardware architectures.
>>>>
>>>>
>>>>
>>>> The primary focus of our work is to upstreaming functionality 
>>>> required to build an incremental compiler and rework cling build 
>>>> against vanilla clang and llvm. The last two points are to give the 
>>>> scope of the work which we will be doing the next 2-3 years. We 
>>>> will send here RFCs for both of them to trigger technical 
>>>> discussion if there is interest in pursuing this direction.
>>>>
>>>>
>>>> Collaboration
>>>> ===
>>>>
>>>> Open source development nowadays relies on reviewers. LLVM is no 
>>>> different and we will probably disturb a good number of people in 
>>>> the community ;)We would like to invite anybody interested in 
>>>> joining our incremental C++ activities to our open every second 
>>>> week calls. Announcements will be done via google group: 
>>>> compiler-research-announce 
>>>> (https://groups.google.com/g/compiler-research-announce).
>>>>
>>>>
>>>>
>>>> Many thanks!
>>>>
>>>>
>>>> David & Vassil
>>>>
>>>> References
>>>> ===
>>>> [1] ROOT GitHub https://github.com/root-project/root
>>>> [2] ROOT https://root.cern
>>>> [3] Cling https://github.com/root-project/cling
>>>> [4] Xeus-Cling 
>>>> https://blog.jupyter.org/xeus-is-now-a-jupyter-subproject-c4ec5a1bf30b
>>>> [5] Cling – The New Interactive Interpreter for ROOT 6, 
>>>> https://iopscience.iop.org/article/10.1088/1742-6596/396/5/052071
>>>> [6] High-performance Python-C++ bindings with PyPy and Cling, 
>>>> https://dl.acm.org/doi/10.5555/3019083.3019087
>>>> [7] 
>>>> https://indico.cern.ch/event/697389/contributions/3085538/attachments/1712698/2761717/2018_09_10_cling_CUDA.pdf
>>>> [8] CUDA C++ in Jupyter: Adding CUDA Runtime Support to Cling', 
>>>> https://zenodo.org/record/3713753#.Xu8jqvJRXxU
>>>> [9] HIP Programming Guide 
>>>> https://rocmdocs.amd.com/en/latest/Programming_Guides/HIP-GUIDE.html
>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>> -- 
>>> Hal Finkel
>>> Lead, Compiler Technology and Programming Languages
>>> Leadership Computing Facility
>>> Argonne National Laboratory
>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory