[clangd-dev] Investigating performance tracking infrastructure

Tue Sep 11 07:00:54 PDT 2018

I see, thanks for the clarification! Looking forward to the changes!

-Kirill

On Thu, Sep 6, 2018 at 10:44 PM Alex L <arphaman at gmail.com> wrote:

>
>
> On Thu, 6 Sep 2018 at 13:07, Kirill Bobyrev <kbobyrev.lists at gmail.com>
> wrote:
>
>> Hi Alex,
>>
>> Thank you for the follow-up!
>>
>> On 5 Sep 2018, at 01:53, Alex L <arphaman at gmail.com> wrote:
>>
>> Hi,
>>
>> I wrote a performance test harness for Clangd, but got delayed with the
>> initial patch as I was out on vacation. The initial design and
>> implementation focuses on measuring code-completion (sema-based right now)
>> for a static project with a fixed set of sources. Before sending out this
>> patch I would like to get some feedback related to one specific example
>> that demonstrates how the code-completion latency can be measured and
>> tracked for a simple project.
>>
>> Let's say we'd like to measure the code-completion latency in 'main.cpp'
>> at line 5 column 1. This design allows you to write an LSP test that's
>> configured by CMake that  is then executed appropriately by lit using a new
>> test format. Here's an example of a test file that would be understood and
>> tested by lit and the new test harness:
>>
>>  {
>>   'compile_commands': '@CMAKE_CURRENT_BINARY_DIR@/compile_commands.json'
>>   'interactions': [
>>     { 'lsp': { 'method': 'textDocument/didOpen', 'params':
>> {'textDocument':{'uri': 'MAKE_URI(@CMAKE_CURRENT_BINARY_DIR@/main.cpp)','languageId':'cpp','version':1.'text':
>> 'LOAD_FILE(@CMAKE_CURRENT_BINARY_DIR@/main.cpp)' }} } },
>>     { 'lsp': { 'method': 'textDocument/completion', 'params':
>> {'textDocument':{'uri': 'MAKE_URI(@CMAKE_CURRENT_BINARY_DIR@/main.cpp)'},'position':{'line':5,'character':1}}'
>> } }
>>   ],
>>   'measure': 'SEMA_COMPLETION'
>> }
>>
>> The format seems to be human-readable, I was wondering whether you expect
>> the general use case for the performance tracking suits pieces to be manual
>> creation or if you have any helper tool to extract pieces of sessions
>> you’re interested in/session actions from specific sessions you would like
>> to target.
>>
>
> Both should work.
> The initial test cases would probably be hand written, but this format is
> intended to be more like a reproducer for Clangd (and not only Clangd)
> where certain user interactions can be repeated.
> Right now I have a script that actually runs completion at a particular
> location and measures the latency, so this would be one way that we could
> use to generate the test.
> We definitely to write a tool that records user's interaction and then
> creates a reproducer/test file (but right now I don't have it yet). It
> could even be integrated into Clangd itself, so the user could generate a
> performance issue reproducer on their own machine. For example, let's say
> the user experiences slow completion after opening two files and running
> completion 3 times in one location. They could make a recording and create
> a reproducer file that would include the right interactions and a link to
> VFS and all of the required files on the side. Then we can archive it and
> the user will be able to create a bug report with those files. We could
> then pretty much use the reproducer a test-case in Clangd itself (possibly
> with some minor modifications).
>
>
>> The 'compile_commands' property would let the test harness know where the
>> configured compilation database is.
>> The 'interactions' property contains a list of LSP/other interactions
>> that are sent to Clangd by the test harness. The MAKE_URI and LOAD_FILE
>> functions would be interpreted appropriately by the test harness.
>> The 'measure' property determines the key metric(s) that are being
>> measured by the test. The CI job would ensure that the corresponding
>> metrics are uploaded to LNT.
>>
>> Do you plan to support different measurements? What would be the other
>> metrics you are considering?
>>
>
> Yes, of course. We should allow any sort of metric/measurement that can be
> captured by the Clangd's tracer.
>
>
>>
>> Having that on LNT would be amazing, I’d be happy to see that!
>>
>> One question for the LNT, though: what would be the project for which you
>> set up the testing? Is that some specific small/large scale project or a
>> fixed version of LLVM?
>>
>
> Initially I'd like to start tracking performance metrics for a fixed
> version of LLVM, but we definitely need more projects. Outside of the LLVM
> umbrella we're starting to run into licensing issues so it would take more
> time to bring other projects, so LLVM would be a good starting point.
>
>
>>
>> Thanks for the feedback,
>> Kirill
>>
>>
>> Please let me know what you think,
>> Thanks
>> Alex
>>
>> On Tue, 21 Aug 2018 at 23:25, Kirill Bobyrev <kbobyrev.lists at gmail.com>
>> wrote:
>>
>>> I see, thank you for the clarification! Looking forward for the patch!
>>>
>>> -Kirill
>>>
>>> On 21 Aug 2018, at 22:48, Alex L <arphaman at gmail.com> wrote:
>>>
>>>
>>>
>>> On Tue, 21 Aug 2018 at 04:36, Kirill Bobyrev <kbobyrev.lists at gmail.com>
>>> wrote:
>>>
>>>> Hi Alex,
>>>>
>>>> I agree with the multiple modes strategy, it would be great to enable
>>>> tracking both sema + index and index performance without any additional
>>>> cost, if the framework would be generic enough that would be even better!
>>>> The other responses are inline.
>>>>
>>>> On Thu, Aug 16, 2018 at 2:04 AM Alex L <arphaman at gmail.com> wrote:
>>>>
>>>>> Thanks for your responses!
>>>>>
>>>>> I realize now that I should have been more specific when it comes to
>>>>> completion latency. We're currently interested in sema completion latency,
>>>>> but the infrastructure that I would like to set up will support latency
>>>>> with the completion results obtained from the index as well.
>>>>> Essentially, for a completion test-case we would like to have the
>>>>> option to run it in two / three modes:
>>>>> - just sema completion
>>>>> - index completion or sema + index completion
>>>>> Note that we don't have to test a completion test-case in all modes,
>>>>> so we could just have a sema based completion test.
>>>>>
>>>>> This way we'll be able to identify the regressions in a particular
>>>>> component (sema vs index) in a better way. Do you think this idea works for
>>>>> you?
>>>>>
>>>>> More responses inline:
>>>>>
>>>>> On Tue, 14 Aug 2018 at 00:31, Eric Liu <ioeric at google.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 14, 2018, 08:40 Kirill Bobyrev <kbobyrev.lists at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Alex,
>>>>>>>
>>>>>>> Such test-suite might be very useful and it'd be great to have it.
>>>>>>> As Eric mentioned, I am working on pulling benchmark library into LLVM.
>>>>>>> Although I fell behind over the past week due to the complications with
>>>>>>> libc++ (you can follow the thread here:
>>>>>>> http://lists.llvm.org/pipermail/llvm-dev/2018-August/125176.html).
>>>>>>>
>>>>>>
>>>>> Thanks! Do you a general idea of how you would like to use the
>>>>> benchmarking library?
>>>>>
>>>> I've looked into benchmark usage in libc++ and test-suite, but  they
>>>> weren't very helpful because they seem to be very specific there. I've
>>>> started looking into pulling the library into LLVM (
>>>> https://reviews.llvm.org/D50894) but I have few concerns there.
>>>>
>>>>> I'm mainly interested in a more complete test that we could run using
>>>>> some sort of harness and whose results can be fed into LNT.
>>>>>
>>>> Can you please elaborate on what you mean by feeding results into LNT?
>>>> Are you thinking about controlling the latency and failing the "benchmark
>>>> tests" as soon as the latency is beyond some limit or are you interested in
>>>> building LNT targets which you can run along unittests?
>>>>
>>>
>>> By feeding results I mean basically uploading and storing them in the
>>> LNT database. We won't really need to run the test-suite using LNT, as the
>>> performance harness will take care of it.
>>>
>>> When it comes to the CI, we won't try to fail the tests because of the
>>> latency while running them. We will instead run all of tests and will
>>> upload the performance results to the store. Then we will run a follow-up
>>> CI job that will compare the gathered results and will check for big
>>> regressions against a baseline.
>>>
>>> We should also have a way to run the tests locally that would check for
>>> regressions right after the tests are done so it would be possible to do
>>> local pre-commit testing.
>>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>>> Eric, Ilya and I have been discussing a possible "cheap" solution -
>>>>>>> a tool which user can feed a compilation database and which could process
>>>>>>> some queries (maybe in YAML format, too). This would allow a realistic
>>>>>>> benchmark (since you could simply feed LLVM codebase or anything else with
>>>>>>> the size you're aiming for) and be relatively easy to implement. The
>>>>>>> downside of such approach would be that it would require some setup effort.
>>>>>>> As an alternative, it might be worth feeding YAML symbol index instead of
>>>>>>> the compilation commands, because currently the global symbol builder is
>>>>>>> not very efficient. I am looking into that issue, too; we have few ideas
>>>>>>> what the performance bottlenecks in global-symbol-builder can be and how to
>>>>>>> fix them, hopefully I will make the tool way faster soon.
>>>>>>>
>>>>>> Note that sema latency is something we also need to take into
>>>>>> consideration, as it's always part of code completion flow, with or without
>>>>>> index.
>>>>>>
>>>>>
>>>>>>> In the long term, however, I think the LLVM Community is also
>>>>>>> interested in benchmarking other tools which exist under the LLVM umbrella,
>>>>>>> so I think that opting in for the Benchmark approach would be more
>>>>>>> beneficial. Having an infrastructure based on LNT that we could run either
>>>>>>> on some buildbots or locally would be even better. The downside is that it
>>>>>>> might turn out to be really hard to maintain a realistic test-suite, e.g.
>>>>>>> storing YAML dump of the static index somewhere would be hard because we
>>>>>>> wouldn't want 300+ Mb files in the tree but hosting it somewhere else and
>>>>>>> downloading would also potentially introduce additional complexity. On the
>>>>>>> other hand, generating a realistic index programmatically might also be
>>>>>>> hard.
>>>>>>>
>>>>>>
>>>>> I don't have a strong opinion for how the index should be stored.
>>>>> However, I think it's helpful to breakdown this problem into different
>>>>> categories, and look at three kinds of indexing data sets:
>>>>> - index data set that's derived from a part of the LLVM umbrella
>>>>> (llvm/clang/test-suite/whatever).
>>>>>   => One possible solution: this index can be rebuilt on every run.
>>>>>
>>>> Yes, but that unfortunately takes too long at the moment. I started
>>>> looking into that and fixed a YAML serialization performance problem (
>>>> https://reviews.llvm.org/D50839), but there are few other bottlenecks
>>>> left.
>>>>
>>>>> - index data set that's derived form a project outside of the LLVM
>>>>> umbrella.
>>>>>   => One possible solution: This index can be stored as an archive of
>>>>> YAML files in one of the LLVM repos.
>>>>>
>>>> That's one of doing it, right, but I'm not sure any LLVM repo would
>>>> like to store a 300 Mb YAML file and update it over time. However, I don't
>>>> know if there are already any cases like this and whether it might be
>>>> acceptable.
>>>>
>>>>> - auto generated index data?
>>>>>
>>>>  While this might be the most appealing option, the generation of a
>>>> realistic index might turn out to be hard. However, I think we should have
>>>> couple of artificial indices in the benchmarks, it might be beneficial.
>>>>
>>>>> It would probably be valuable to have different kinds of index data
>>>>> sets.
>>>>>
>>>> Agreed, that would also mean more coverage.
>>>>
>>>> Relatively "cheap" solutions which I'm thinking about are:
>>>>
>>>> * Recording user session and mirroring the input file to Clangd to
>>>> measure the performance. That would eliminate the complexity of creating a
>>>> realistic benchmark without investing too much effort into the benchmark
>>>> itself. However, it might turn out to be hard to track the performance
>>>> contributions of individual components. Also, I'm not sure if it's generic
>>>> enough.
>>>>
>>>
>>> Recording the user session can certainly be very appealing, but I'm not
>>> sure how well it will translate into a realistic benchmark. I suppose it
>>> depends on the particular performance issue. Nevertheless, It would be
>>> really good to have this capability to help us investigate performance
>>> issues.
>>>
>>>
>>>> * Creating a tool which would accept YAML symbol dump, build an index
>>>> and get a set of requests (e.g. from another file) to measure total
>>>> completion latency. That solves most of my problems, but is tied to the
>>>> index testing usecase which is not enough for comprehensive performance
>>>> tracking.
>>>>
>>>> I unfortunately didn't get any good idea of how to build comprehensive
>>>> performance tracking pipeline yet, e.g. how to continuously get index for
>>>> (e.g.) LLVM, adjust buildbots, measure performance, ensure that it's
>>>> realistic, etc.
>>>>
>>>
>>>>>
>>>>>>
>>>>>>> Having said that, convenient infrastructure for benchmarking which
>>>>>>> would align with the LNT and wouldn't require additional effort from the
>>>>>>> users would be amazing and we are certainly interested in collaboration.
>>>>>>> What models of the benchmarks have you considered and what do you think
>>>>>>> about the options described above?
>>>>>>>
>>>>>>
>>>>> For the sema based completion latency tracking I would like to start
>>>>> off with two simple things to get some basic infrastructure working:
>>>>> - C++ test-case: measuring sema code-completion latency (with
>>>>> preamble) in a file from a fixed revision of Clang.
>>>>> - ObjC test-case: similar to above, some ObjC code with a portion.
>>>>> One issue is that it the system headers that will be used are not
>>>>> static, which leads to issues like the baseline might be out of date when
>>>>> the SDK on the GreenDragon bots is updated.
>>>>>
>>>>> Ideally I would use some harness based on compile commands. Each test
>>>>> file would have a compilation command entry in the database.
>>>>> I was also thinking that the test command could be fed into Clangd
>>>>> using LSP itself. Similarly to how code-completion is requested in Clangd's
>>>>> regression test, we could write a test that would send in the LSP commands
>>>>> into Clangd. Or maybe the test harness could generate them from some sort
>>>>> of test description (e.g. test completion at these locations at that file).
>>>>>
>>>>> The latency could be measured by scanning the output of the run of
>>>>> Clangd with CLANGD_TRACE.
>>>>>
>>>>> The test harness would then capture the result and upload it to LNT. A
>>>>> subsequent bot would check for big regressions (e.g. +10%) against the
>>>>> baseline (or previous result).
>>>>>
>>>> Sounds good to me!
>>>>
>>>
>>> I'm hoping to put up a patch for a prototype implementation sometime
>>> this week.
>>>
>>> Cheers,
>>> Alex
>>>
>>>
>>>>
>>>> Kind regards,
>>>> Kirill Bobyrev
>>>>
>>>>>
>>>>> Cheers,
>>>>> Alex
>>>>>
>>>>>
>>>>>>
>>>>>>> Kind regards,
>>>>>>> Kirill Bobyrev
>>>>>>>
>>>>>>> On Tue, Aug 14, 2018 at 7:35 AM Eric Liu via clangd-dev <
>>>>>>> clangd-dev at lists.llvm.org> wrote:
>>>>>>>
>>>>>>>> Hi Alex,
>>>>>>>>
>>>>>>>> Kirill is working on pulling google benchmark library into llvm and
>>>>>>>> adding benchmarks to clangd. We are also mostly interested in code
>>>>>>>> completion latency and index performance at this point. We don't have a
>>>>>>>> very clear idea on how to create realistic benchmarks yet e.g. what code to
>>>>>>>> use, what static index corpus to use. I wonder if you have ideas here.
>>>>>>>>
>>>>>>>> Another option that might be worth considering is adding a tool
>>>>>>>> that runs clangd code completion on some existing files in the llvm/clang
>>>>>>>> codebase. It can potentially measure both code completion quality and
>>>>>>>> latency.
>>>>>>>>
>>>>>>>> -Eric
>>>>>>>> On Tue, Aug 14, 2018, 00:53 Alex L via clangd-dev <
>>>>>>>> clangd-dev at lists.llvm.org> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I'm currently investigating and putting together a plan for
>>>>>>>>> open-source and internal performance tracking infrastructure for Clangd.
>>>>>>>>>
>>>>>>>>> Initially we're interested in one particular metric:
>>>>>>>>> - Code-completion latency
>>>>>>>>>
>>>>>>>>> I would like to put together infrastructure that's based on LNT
>>>>>>>>> and that would identify performance regressions that arise as new commits
>>>>>>>>> come in. From the performance issues I've observed in our libclang stack
>>>>>>>>> the existing test-suite that exist in LLVM does not really reproduce the
>>>>>>>>> performance issues that we see in practice well enough. In my opinion we
>>>>>>>>> should create some sort of editor performance test-suite that would be
>>>>>>>>> unrelated to the test-suite that's used for compile time and performance
>>>>>>>>> tracking. WDYT?
>>>>>>>>>
>>>>>>>>> I'm wondering if there are any other folks looking at this at the
>>>>>>>>> moment as well. If yes, I would like to figure out a way to collaborate on
>>>>>>>>> a solution that would satisfy all of our requirements. Please let me know
>>>>>>>>> if you have ideas in terms of how we should be running the tests /  what
>>>>>>>>> the test-suite should be, or what you needs are.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Alex
>>>>>>>>> _______________________________________________
>>>>>>>>> clangd-dev mailing list
>>>>>>>>> clangd-dev at lists.llvm.org
>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/clangd-dev
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> clangd-dev mailing list
>>>>>>>> clangd-dev at lists.llvm.org
>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/clangd-dev
>>>>>>>
>>>>>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/clangd-dev/attachments/20180911/82b85372/attachment-0001.html>