<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/141154>141154</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            clangd needs a feature to build reduced context as input to large language model prompts
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          bartlettroscoe
      </td>
    </tr>
</table>

<pre>
    ## Description

A feature that is badly needed for the application of AL models for C++ code is the ability to build a reduced context for large C++ code bases where only a minimal context is extracted and used to generate a prompt for a large language model (LLM). The problem is that the large C++ code bases can't fit in the prompt for even the best LLM models and providing extra C++ code that does not provide context just confuses the LLM and degrades performance.  What is needed is a tool where you can point to selections of C++ "code of interest" (e.g., some functions, classes, or just few lines of C++ code), and then goes off and recursively looks up all of the classes, functions, variables, etc. that are used in that "code of interest" and produces a listing of C++ code as context with just those upstream dependencies.  What this does is to basically take a very large C++ project and turn it into a smaller C++ project (at least for what the LLM needs to know).

The need for this is type of context gathering is described in the paper:

* "YABLoCo: Yet Another Benchmark for Long Context Code Generation", submitted 5/7/2025, https://arxiv.org/abs/2505.04406v1

The basic outline of such a tool based on LLVM is described in the paper:

* "CITYWALK: Enhancing LLM-Based C++ Unit Test Generation via Project-Dependency Awareness and Language-Specific Knowledge", submitted 1/27/2025, https://arxiv.org/abs/2501.16155

The clangd tool already does indexing of a large C++ project and has access to all of the source code (and the AST if needed).  The clangd tool would seem like the logical place to add such a recursive context lookup for a large C++ project.  This, together with integration with the (VSCode) Continue.dev extension, this would provide a seamless way to provide the context needed to pass to the prompt of the LLM so that it has a shot at fully understanding a selection of C++ code (so it can explain, refactor, or add unit tests for the code of interest).

Clande 4.0 suggests this is what you need to do:

* https://claude.ai/share/58749dfb-3fdf-4379-8512-f49068bfc335


</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJyUVk1v67oR_TXyZmDDpqXYXnjh-NZF8XyBAjd9D3c5IkcSXyhSICk7_vfFkHJynfQV7SqOSM7HmXMOiSHo1hLti-q5qL7NcIyd8_safTQUo3dBOprVTt32hVgXYg3fKEivh6idLZaHYnk4QEMYR08QO4ygA9SozA0skSIFjfMQOwIcBqMl8jlwDRzO0DtFJqQNx0I8F-IZpFPEEdKBWhsdbxAd1KM2ChA8qVGSAulspLeYjhr0LT0GqDFQgGtHnsBZcwOEXlvdo3k_qQPQW_QoIylAq2AMpDhVS5Y8RgKEwbt-yElwSmPQtiO2lGuHQmzP5--F2C3gpSM-UBvqcwMYUxd_WZ5EW4hNhEZH0Dbt_SUhXSh_qylEOJ-_39HiWgfvLlpp2-YeHqOnzMpRAOvitJXe-_5zDJH_aUaugRNwbA6qqPWoKMBAvnG-RytpAfDHNNNpmjoAQnTOTPDe3MidwOC0jYxfIEOShxx4yvfKCiFSca4BbSN5CrEQgvGjRbsoxBGC6wma0eaz_EUaDIHST-dz4Q1dwWhLD7E5cCF2vI_7iB1ZaF3a06QvnuTog76QuYFx7jXAOAAaw0EYgV8SPRRwQa-xNnmFolxkbNFTZkuaGsa_am6aFDOWQTM6RB7Zp8oBw_twrjp2udHYuUAwDiF6wh4UDWQVWakp3GcSOx3ynJlujlmlJRpzg4ivTN8L-dsn-g3e_UkyZpxGbyFxLzpACD0aQ_7L1kJsMYIhDJmY1zuxmTdMipT81boryyA7AmuBlybt61zibUgQ3ZttMXbkGRHuI5lKfQeVYMCBfLE-5ICFODDKPw_PZ3d0xfoAPynCwToOAc9kZdejf035zs62cJySHBnhv2dJs2EJkbg21r2OrPyqEKdNIU5iKSpe6WIcAqcVp0Kc0L_py8L5ln_XgfdVy2qxLMvl02X10WuCHtwYmZvcYxhld9cJq12Bs3A-__79f-_1-I-Xn38czr9xs3-zHVrJUJ3P3-fPKeB9Tv-yOsILm8RHm3DRCP_MA5x_u3PnBocrerIUsoucJy-b_xhI6kZL-M26qyHV0hecVtz7_wnUarF6WlXVB0yS3VNlVNB4QnWbCGwVvU3awP9C2Q4DoJTcAHP2Q8HBjV5S1hMTNtsAHH68gG4m60ouDZ_ruLrRKAhEPRj9StmyXctKgsGgpJRJqftI373kncfsKOPwcEt8qj2l1clFomspcTZJnd2inUaWPnD2Qmx__3HMnpZ4rO1IC0UXNnuyIdH4mGWVq797PEIg7A3Dc8V0b95XkstN9U5GzquYgfzl6pnwZG0HN93nMeMOoXMRMEIzssmMVpEPEW26h_DD978YXCG2wXEYvifobTCoUweeGpTR-cngGeSRyRwpxPD-bPjqrHeXORq0iqBcLCGMbZtO3c0muRTfTcmGogPlHgT2SF9pcFS0QF2IU-jQUyFO1XZT7lRTz9eNaublerObb6uVmDflbvm0rRu5Xk_Unqn9Wu3WO5zRfrUpN7tqvVuuZt1-t8PNk9xRias1brZyXcuy3slGodytaLub6T3LaVkJIUTJgnlarTdPq6pUqqo25dO2KJfUozYLYy4962umQxhpvypXq6qcGazJhPR0E8LSFdIqi7f6NvN7PjSvxzYU5ZJvn_ARJupoaD8JIbs4fjzj7g-uz88tZKkOY7rn_-N7KLMozEZv9o8Qtzp2Y72Qri_EicuY_swnjRTilIpn75i6u-zFvwMAAP__fKaSHA">