[cfe-dev] [RFC] Moving (parts of) the Cling REPL in Clang
Vassil Vassilev via cfe-dev
cfe-dev at lists.llvm.org
Thu Jul 9 13:46:00 PDT 2020
Motivation
===
Over the last decade we have developed an interactive, interpretative
C++ (aka REPL) as part of the high-energy physics (HEP) data analysis
project -- ROOT [1-2]. We invested a significant effort to replace the
CINT C++ interpreter with a newly implemented REPL based on llvm --
cling [3]. The cling infrastructure is a core component of the data
analysis framework of ROOT and runs in production for approximately 5
years.
Cling is also a standalone tool, which has a growing community outside
of our field. Cling’s user community includes users in finance, biology
and in a few companies with proprietary software. For example, there is
a xeus-cling jupyter kernel [4]. One of the major challenges we face to
foster that community is our cling-related patches in llvm and clang
forks. The benefits of using the LLVM community standards for code
reviews, release cycles and integration has been mentioned a number of
times by our "external" users.
Last year we were awarded an NSF grant to improve cling's sustainability
and make it a standalone tool. We thank the LLVM Foundation Board for
supporting us with a non-binding letter of collaboration which was
essential for getting this grant.
Background
===
Cling is a C++ interpreter built on top of clang and llvm. In a
nutshell, it uses clang's incremental compilation facilities to process
code chunk-by-chunk by assuming an ever-growing translation unit [5].
Then code is lowered into llvm IR and run by the llvm jit. Cling has
implemented some language "extensions" such as execution statements on
the global scope and error recovery. Cling is in the core of HEP -- it
is heavily used during data analysis of exabytes of particle physics
data coming from the Large Hadron Collider (LHC) and other particle
physics experiments.
Plans
===
The project foresees three main directions -- move parts of cling
upstream along with the clang and llvm features that enable them; extend
and generalize the language interoperability layer around cling; and
extend and generalize the OpenCL/CUDA support in cling. We are at the
early stages of the project and this email intends to be an RFC for the
first part -- upstreaming parts of cling. Please do share your thoughts
on the rest, too.
Moving Parts of Cling Upstream
---
Over the years we have slowly moved some patches upstream. However we
still have around 100 patches in the clang fork. Most of them are in the
context of extending the incremental compilation support for clang. The
incremental compilation poses some challenges in the clang
infrastructure. For example, we need to tune CodeGen to work with
multiple llvm::Module instances, and finalize per each
end-of-translation unit (we have multiple of them). Other changes
include small adjustments in the FileManager's caching mechanism, and
bug fixes in the SourceManager (code which can be reached mostly from
within our setup). One conclusion we can draw from our research is that
the clang infrastructure fits amazingly well to something which was not
its main use case. The grand total of our diffs against clang-9 is: `62
files changed, 1294 insertions(+), 231 deletions(-)`. Cling is currently
being upgraded from llvm-5 to llvm-9.
A major weakness of cling's infrastructure is that it does not work with
the clang Action infrastructure due to the lack of an
IncrementalAction. A possible way forward would be to implement a
clang::IncrementalAction as a starting point. This way we should be able
to reduce the amount of setup necessary to use the incremental
infrastructure in clang. However, this will be a bit of a testing
challenge -- cling lives downstream and some of the new code may be
impossible to pick straight away and use. Building a mainline example
tool such as clang-repl which gives us a way to test that incremental
case or repurpose the already existing clang-interpreter may be able to
address the issue. The major risk of the task is avoiding code in the
clang mainline which is untested by its HEP production environment.
There are several other types of patches to the ROOT fork of Clang,
including ones in the context of performance,towards C++ modules
support (D41416), and storage (does not have a patch yet but has an open
projects entry and somebody working on it). These patches can be
considered in parallel independently on the rest.
Extend and Generalize the Language Interoperability Layer Around Cling
---
HEP has extensive experience with on-demand python interoperability
using cppyy[6], which is built around the type information provided by
cling. Unlike tools with custom parsers such as swig and sip and tools
built on top of C-APIs such as boost.python and pybind11, cling can
provide information about memory management patterns (eg refcounting)
and instantiate templates on the fly.We feel that functionality may not
be of general interest to the llvm community but we will prepare another
RFC and send it here later on to gather feedback.
Extend and Generalize the OpenCL/CUDA Support in Cling
---
Cling can incrementally compile CUDA code [7-8] allowing easier set up
and enabling some interesting use cases. There are a number of planned
improvements including talking to HIP [9] and SYCL to support more
hardware architectures.
The primary focus of our work is to upstreaming functionality required
to build an incremental compiler and rework cling build against vanilla
clang and llvm. The last two points are to give the scope of the work
which we will be doing the next 2-3 years. We will send here RFCs for
both of them to trigger technical discussion if there is interest in
pursuing this direction.
Collaboration
===
Open source development nowadays relies on reviewers. LLVM is no
different and we will probably disturb a good number of people in the
community ;)We would like to invite anybody interested in joining our
incremental C++ activities to our open every second week calls.
Announcements will be done via google group: compiler-research-announce
(https://groups.google.com/g/compiler-research-announce).
Many thanks!
David & Vassil
References
===
[1] ROOT GitHub https://github.com/root-project/root
[2] ROOT https://root.cern
[3] Cling https://github.com/root-project/cling
[4] Xeus-Cling
https://blog.jupyter.org/xeus-is-now-a-jupyter-subproject-c4ec5a1bf30b
[5] Cling – The New Interactive Interpreter for ROOT 6,
https://iopscience.iop.org/article/10.1088/1742-6596/396/5/052071
[6] High-performance Python-C++ bindings with PyPy and Cling,
https://dl.acm.org/doi/10.5555/3019083.3019087
[7]
https://indico.cern.ch/event/697389/contributions/3085538/attachments/1712698/2761717/2018_09_10_cling_CUDA.pdf
[8] CUDA C++ in Jupyter: Adding CUDA Runtime Support to Cling',
https://zenodo.org/record/3713753#.Xu8jqvJRXxU
[9] HIP Programming Guide
https://rocmdocs.amd.com/en/latest/Programming_Guides/HIP-GUIDE.html
More information about the cfe-dev
mailing list