[cfe-dev] RFC: Preprocessor option to assist with parsing a single file only
Argyrios Kyrtzidis via cfe-dev
cfe-dev at lists.llvm.org
Tue Jun 20 08:32:15 PDT 2017
> On Jun 19, 2017, at 3:31 AM, Manuel Klimek via cfe-dev <cfe-dev at lists.llvm.org> wrote:
>
> I think this is a useful addition that has been requested multiple times in the past - have you by chance tried this on C++ code? I'd predict it doesn't work well there, but I'd be curious whether you have other results :)
We haven’t tried on C++. It likely depends on the style of the codebase, it would be interesting to see how much info we can get using the clang repo.
Note that, even for ObjC, experiments with generalizing beyond unit test discovery showed that we’d need improvements in error recovery, e.g
- when you have '@interace A : B’ it should not drop ‘B’ completely from the super-class list if ‘B’ is unresolved
- there were cases where clang was too ‘liberal’ in skipping tokens after a parser error
- unresolved types changing to ‘int’ is not great
>
> On Fri, Jun 16, 2017 at 2:11 AM Argyrios Kyrtzidis <kyrtzidis at apple.com <mailto:kyrtzidis at apple.com>> wrote:
> Put a patch for review here:
> https://reviews.llvm.org/D34263 <https://reviews.llvm.org/D34263>
>
>
>> On Jun 14, 2017, at 6:25 PM, Argyrios Kyrtzidis via cfe-dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>>
>
>> Hey all,
>>
>> In r305044 I introduced a preprocessor option (bool SingleFileParseMode) and clang-c/Index.h enumerator (CXTranslationUnit_SingleFileParse) to assist with ‘parsing a single file only’. I’m going to provide some details and context on why such parsing is useful and why a new option is necessary.
>>
>> Parsing a single file (essentially parse it normally but without including any other headers) is useful as a way to determine the global symbols that exist in the source files, in an inaccurate but ‘lightning-super-fast’ mode. For example, if the source is like this:
>>
>> @implementation Foo
>> -(void)testSomething {}
>> -(NSString*)returnIt { return @“blah”; }
>> @end
>>
>> The parser can determine that there is an ObjC @implementation named ‘Foo’ with 2 methods, -testSomething, and -returnIt. Even if no SDK header gets included and ‘NSString’ becomes unresolved, the parser can still provide the associated global symbols.
>>
>> In general terms, think of this like approximating the inaccurate parsing that something like SublimeText is doing, where there’s no preprocessor or precise typechecking but it can still provide you with a list of symbols and some rudimentary jump-to-definition.
>>
>> We’ve used this for a while now in Xcode to do something like ‘fast-scanning’ specifically for ObjC unit tests (*). This allows us to show the available unit tests almost immediately once you open a project, without waiting for the full-accurate indexing to complete.
>> If the ‘fast-scan’ is missing something, e.g. due to preprocessor directives or macros, it will still show up once the accurate indexing catches up.
>>
>> To clarify, this is working without any modifications to clang, we were just using libclang to parse the file containing the unit tests and did not pass any search paths, which had the practical effect of not including headers. So why adding the option now ?
>>
>> This is due to the limitation of the 'fast scan' not seeing symbols inside preprocessor directives. For example, with code like this:
>>
>> #if ENABLE_FOO_TESTS
>>
>> @implementation Foo
>> -(void)testSomething {}
>> @end
>>
>> #endif
>>
>> ‘ENABLE_FOO_TESTS’ is not defined so the preprocessor skips this block and we miss getting these tests via the ‘fast scan’. Here’s what I’d like to propose:
>>
>> If ‘SingleFileParseMode’ is true, the preprocessor will treat undefined identifiers in preprocessor directives specially. If the directive is making use of an undefined identifier then it will cause it to ignore the directive and parse all blocks of the directive (the #if block, and the #else one as well).
>> If the directive is using literals like:
>>
>> #if 0
>> …
>> #endif
>>
>> #if 1
>> …
>> #endif
>>
>> Or making use of defined macros then there’s no change of behavior.
>>
>> With such a change, in this ‘fast-scan-inaccurate-mode’ we’ll be able to gather the symbols that exist in preprocessor directives like the "#if ENABLE_FOO_TESTS” example.
>>
>> Let me know what you think!
>>
>>
>> (*) Dealing only with detection of ObjC unit tests has a restricted scope and clang was well equipped to help with unmodified. If we want to extend ‘fast/inaccurate’ parsing and try to gather such symbol info from all files, clang would need to be enhanced to improve its error recovery and not drop valuable information from its AST when there are compiler errors. But this is a discussion for another thread at some later point in future.
>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170620/bb30ca86/attachment.html>
More information about the cfe-dev
mailing list