[clang-tools-extra] [clang-doc] Improve performance by adding a short circuit (PR #96809)

via cfe-commits cfe-commits at lists.llvm.org
Sun Jul 14 23:36:19 PDT 2024


PeterChou1 wrote:

> > Ok nevermind, disregard the above comment I was wrong about the mechanism of the bug. the source of this bug comes from the way clang-doc handles C code, particularly anonymous typedef in C. When clang-doc encounters an anonymous typedef in C it incorrectly serializes its a name. As a simple example running clang-doc on this file
> > test.c
> > ```
> > /**
> >  * Foo anon typedef
> >  */
> > typedef struct {} Foo;
> > ```
> > 
> > 
> >     
> >       
> >     
> > 
> >       
> >     
> > 
> >     
> >   
> > with cause it to interpret Foo as @nonymous_record_XXXXXXX.
> > The reason why that is because when checking anonymous typedef name we're explicitly casting it to a CXXDecl. (see code [here](https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/clang-doc/Serialize.cpp#L664)) Since the above is c code it won't be cast to CXXRecordDecl and hences clang-doc will treat it as an anonymous record.
> > This becomes a problem when we run the LLVM compilation database. As an example, a record that we were not properly serializing before was LLVMOrcCDependenceMapPair. When clang-doc runs the compilation database of LLVM it first comes across the file OrcV2CBindingsRemovableCode.c under the directory llvm/examples/OrcV2Examples/OrcV2CBindingsRemovableCode. This file causes recursiveASTVisitor to visit LLVMOrcCDependenceMapPair but because its c code its name is not found and it is incorrectly serialized into @nonymous_record_XXXXX. Normally this is not an issue for clang-doc, since we'll visit the record multiple times and the next time we visit a cpp file using LLVMOrcCDependenceMapPair , clang-doc will correctly fill in the name of the declaration. This also means that the order of the compilation also matters. If clang-doc had parsed a cpp file with LLVMOrcCDependenceMapPair than it will correctly serialized the record. This was why I had so much trouble reproducing the bug when I was using clang-doc --filter option to try and isolate the bug.
> > Since this patch changes the behaviour to only visiting the a typedef declaration at most once, we now incorrectly serialized what was LLVMOrcCDependenceMapPair to @nonymous_record_XXXXX since on my machine it encounters LLVMOrcCDependenceMapPair first when its parsing OrcV2CBindingsRemovableCode.c.
> > The workaround I added still works since when it comes to anonymous typedefs we skip out on memomization. It would be better if we just correctly handle the case of parsing anonymous typedef in C. But this approach is more conservative and safer since it reverts to old behaviour of clang-doc instead of introducing some other bug with parsing typedefs.
> 
> Thanks for the detailed analysis. With the current version of the patch, are there any difference between what we get on main and what we get w/ this patch?

When sorted by USR, there are no differences between the index between this patch and what's on main,

https://github.com/llvm/llvm-project/pull/96809


More information about the cfe-commits mailing list