[clangd-dev] New LSP language service supporting Swift and C-family languages, using clangd

Tue Oct 23 11:34:11 PDT 2018

> On Oct 23, 2018, at 1:01 AM, Ilya Biryukov <iu.biryukov at gmail.com> wrote:
> 
> Is it fair to say the design is driven by the fact that the LSP protocol itself has no way of doing cross-language interactions without baking all of the involved languages into the same language server?
> Another non-language-specific reason that I see is the lack of index-while-build support  on our side (and lack of control over the build and the compiler in general).

I’ll add some more details that will make it more clear why the design I described is advantageous for us. Xcode has the capability to use multiple different toolchains within one workspace, at the same time. Internal OS developers can use platform-specific toolchains, and externally you can also load a Swift OSS toolchain and use it for everything or just for a subset of the buildable products in your workspace. There was also a release of Xcode in the past where the 2 language versions of Swift we supported were actually separate swift toolchains, and you could choose which version you wanted to use  for a product.

This is handled by our internal language service (not-LSP-based) that Xcode talks to. Xcode tells it what compiler arguments and toolchain to use for a file and the service loads the right clang/libclang for that file. Once we replace libclang with clangd that means we may have multiple clangd processes running concurrently. Now, to this setup also add Swift/sourcekitd (an XPC service process giving access to the Swift compiler ASTs and related functionality) for Swift support, and consider that there may also be multiple sourcekitd processes running. And consider a hypothetical future where we also added support for Metal files, meaning yet another additional language-specific service.

You can probably guess by the above that it doesn’t make practical sense for us to have each individual language service process (per language and per toolchain) its own index. It makes more sense for us to have a language-independent layer that handles all the mixed-language aspects, like being able to provide cross-language caller hierarchy, while delegating to the specific language/toolchain service when needing to access a compiler AST or language-specific functionality.

> On Oct 23, 2018, at 8:22 AM, Sam McCall <sammccall at google.com> wrote:
> 
> Hi Argyrios,
> 
> Thanks for the update! I think the design makes sense, and having a cross-language index will allow really nice mixed-language project support.
> 
> There is a lot of implementation overlap, the design we ended up with is actually pretty similar: auto-index (http://tinyurl.com/clangd-automatic-index <http://tinyurl.com/clangd-automatic-index>) extracts the symbols/refs from TUs, and then we build a queryable index structure on top of it (Dex). It's all in-process, but conceptually not too different than the background-clang-worker variant.

Since you started looking at background indexing functionality here’s some feedback based on our experiences, which will also provide some context for why we pursued index-while-building.

It should come as no surprise but generating clang TUs for background indexing is hugely computationally expensive. And what makes things worse is that there are different user expectations on building versus background indexing; people are expecting a build to take a long time and all of their available CPU (in fact they will justifiably be annoyed if building doesn’t take over the whole CPU) but they hate it when something is taking lots of CPU for a long time in the background. Since it’s happening in the background, you necessarily put it in lower thread priority and avoid taking all available CPU resources, which also makes it take longer and sometimes even slower than building, which exacerbates the issue more.

With index-while-building we got multiple benefits:

* The major and most obvious of course is that we are able to re-use the clang TUs created during the build, to get the index data. Once the build finishes we have all the data and there’s no need to create clang TUs again, for the same files, in order to index them separately.
* There is a certain fragility for creating a TU outside a build, files may be missing (header files, vfs files) depending on the state of the project/build, and the TU that gets created may be erroneous or problematic in general. However we have strong guarantees that once you hit build and manage to build everything without errors, we have 100% accurate index data for the files that got built. This is something that is obvious to the user (did it build successfully or not) while what TUs get created in the background and why they may be failing is generally opaque to them, and they have less incentive to fix such issues since the build is working fine.
* This is a bit more subtle but quite powerful. With index-while-building we have a robust fallback allowing us to try things for background indexing that could provide a huge performance speedup even if they reduce its accuracy by just a little bit. This is because even if the background indexer got something wrong in some edge case for a file, we know that once the user builds (which generally happens often) we are going to get back to 100% accuracy once the file is built. Admittedly we haven’t taken much advantage of this, but we have some ideas we are interested in trying out for the future.

> We started with a dynamic index which just has opened TUs, and a static one built by a mapreduce-like process, and autoindex is these approaches meeting in the meddle. We'd actually hoped to rely on index-while-build infrastructure for some of this, but it didn't arrive in time.
> 
> That said, the devil is in the details: actually sharing an implementation of the index layer would probably be hard, as we're all getting pulled in different directions.
> If it's possible to plug your index into the SymbolIndex abstraction in clangd, then lots of features will "just work", like fast code completion. Clangd relies on the index for that today rather than the caching approaches used by libclang.
> We've had multiple implementations of this (including a large-scale out-of-process one internal to Google), but so far they've all mostly shared indexing logic. So I'm curious whether this is actually implementable on different data sources like what you'll get from index-while-build.

We haven’t looked into Dex in details but it seems that it could play the role of what we currently use LMDB for, speed up queries to efficiently figure out where certain information resides in the record files, generated by index-while-building.

> 
> One question I have is a practical one - I'm sure changes are needed to clangd, are these likely to happen upstream or in a fork/merge cycle?

CC’ed AlexL and JanK, they can speak more about this. Beyond clangd, I’d like to also mention that we’ll be resuming our upstreaming effort for the index-while-building patches.

> 
> Looking forward to seeing more details!
> Cheers, Sam
> 
> On Tue, Oct 23, 2018 at 8:32 AM Argyrios Kyrtzidis via clangd-dev <clangd-dev at lists.llvm.org <mailto:clangd-dev at lists.llvm.org>> wrote:
> Hey all,
> 
> We've recently announced that we'll be starting a new open-source project for an LSP language service supporting Swift and C-family languages, see more details in the announcement post (https://forums.swift.org/t/new-lsp-language-service-supporting-swift-and-c-family-languages-for-any-editor-and-platform <https://forums.swift.org/t/new-lsp-language-service-supporting-swift-and-c-family-languages-for-any-editor-and-platform>). I wanted to also mention additional details that relate to Clangd.
> 
> Currently, for our C-family support in Xcode (code-completion, clang AST queries) we use libclang, but for the new LSP service we will switch to using Clangd. We will also open-source a C++ library for global index queries, which is built on top of LMDB (https://symas.com/lmdb <https://symas.com/lmdb>). The functionality of this library is described by Nathan in his Index-While-Building design document (https://docs.google.com/document/d/1cH2sTpgSnJZCkZtJl1aY-rzy4uGPcrI-6RrUpdATO2Q <https://docs.google.com/document/d/1cH2sTpgSnJZCkZtJl1aY-rzy4uGPcrI-6RrUpdATO2Q>), specifically in the 'Using the index store' section.
> 
> Let me elaborate a bit more on how we use this library. From Clang (and Swift) we get raw index data files, either directly from building or from invoking clang for background indexing. These data record files are designed to be efficient to write and update, ensuring that record files for headers are only written once, so that index-while-building has minimal overhead. But they are not designed to do efficient global queries (give me all symbol occurrences of this symbol USR). To accommodate this we use this database library which is a lightweight index layer on top of the raw index records. It reads the raw index data files and populates a key-value database that enables efficient global queries (it essentially determines what raw index record files contain the relevant information and retrieves the data).
> 
> In our design for having full cross-language support for Swift and Clang languages (e.g. call-hierarchy across languages), we prefer to have a language-independent indexing component that is layered on top of the compiler-specific support (Clang/Clangd and Swift/sourcekitd). That means that our LSP service will contain an indexing and global refactoring engine and it will delegate to Clangd for clang-specific document queries, like code-completion.
> 
> I understand that Clangd is intended to be a self-contained language service, that includes functionality for global index queries along with document-specific queries, but we believe we could still collaborate on common infrastructure shared by both Clangd and our new cross-language LSP service. See AlexL's previous post about how we intend to use Clangd, https://lists.llvm.org/pipermail/cfe-dev/2018-April/057668.html <https://lists.llvm.org/pipermail/cfe-dev/2018-April/057668.html> and what kind of improvements we want to make.
> 
> Once we have the repositories up, you'll be able to check out our overall design in more detail, and in the meantime I'd be happy to hear any feedback or questions you may have!
> _______________________________________________
> clangd-dev mailing list
> clangd-dev at lists.llvm.org <mailto:clangd-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/clangd-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/clangd-dev>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/clangd-dev/attachments/20181023/e30e414d/attachment-0001.html>