[PATCH] D125487: [Tooling/DependencyScanning] Refactor dependency scanning to produce pre-lexed preprocessor directive tokens, instead of minimized sources

Sat May 21 12:06:56 PDT 2022

akyrtzi marked an inline comment as done.
akyrtzi added inline comments.

================
Comment at: clang/unittests/Lex/DependencyDirectivesScannerTest.cpp:175
       "#define MACRO   con  tent   ", Out));
-  EXPECT_STREQ("#define MACRO con  tent\n", Out.data());
+  EXPECT_STREQ("#define MACRO con tent\n", Out.data());

----------------
dexonsmith wrote:
> Can you clarify why this is changing in this patch? It’s not obvious to me why this would start optimizing internal whitespace of macros. 
> 
> (I also can’t remember if it’s safe/correct. Is there a way to stringify the contents of a macro? If so, this would change the result… if not, then this seems like an improvement, but maybe better for a separate patch?)
> 
> (If there’s a good reason to change it here, don’t want to hold it up, but it’s not explained in the commit message so it’s not obvious to me.)
> Can you clarify why this is changing in this patch? It’s not obvious to me why this would start optimizing internal whitespace of macros.

Minimization was doing `printToNewline()` after the macro identifier, which just prints verbatim, including any amount of whitespace. OTOH the new way is collecting the tokens of the directive, which ignores unnecessary whitespace.

> (I also can’t remember if it’s safe/correct. Is there a way to stringify the contents of a macro? If so, this would change the result… 

AFAIK this is correct since the macro body is getting tokenized when the preprocessor sees a `#define` and the unnecessary whitespace is ignored.

> if not, then this seems like an improvement, but maybe better for a separate patch?)

To clarify, `minimizeSourceToDependencyDirectives()` is only used for testing purposes, once we get the tokens of the directives then we use them directly during preprocessing, so there's no place for preserving the unnecessary whitespace. This change is just inherent to the difference in implementation for minimization vs tokenization.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125487/new/

https://reviews.llvm.org/D125487