[PATCH] D102669: [analyzer][ctu] Fix wrong 'multiple definitions' errors caused by space characters in lookup names when parsing the ctu index file

Mon Dec 6 05:57:30 PST 2021

steakhal added inline comments.

================
Comment at: clang/lib/CrossTU/CrossTranslationUnit.cpp:154-172
+  // Find the length delimiter.
+  const size_t LengthDelimiter = LineRef.find(':');
+  if (StringRef::npos == LengthDelimiter)
+    return false;
+
+  // Parse the length of LookupName as USRLength.
+  size_t USRLength = 0;
----------------
OikawaKirie wrote:
> steakhal wrote:
> > OikawaKirie wrote:
> > > steakhal wrote:
> > > > OikawaKirie wrote:
> > > > > steakhal wrote:
> > > > > > 
> > > > > The source of lookup name of the function being imported is function `CrossTranslationUnitContext::getLookupName`. Keeping the length in the mapping can avoid parsing the lookup name during importing.
> > > > Okay; you can copy the original StringRef to have that. But by consuming it on one path makes the code much more readable.
> > > > 
> > > The `getAsInterger` call can also check whether the content before the first colon is an integer. Therefore, a sub-string operation is required here.
> > I don't doubt that your proposed way of doing this works and is efficient.
> > What I say is that I think there is room for improvement in the non-functional aspects, in the readability. However, it's not really a blocking issue, more of a personal taste.
> I know what you are considering, it is clearer and more readable by consuming the length, then the USR. However, to correctly separate the USR and file path, the length of `USR-Length` is also required, which makes it impossible to just *consume* the length at the beginning.
> 
> Another way of solving this problem is to re-create the string with the USR-Length and the USR after parsing, but I think it is not a good solution.
> 
> BTW, is it necessary to assert the `USR-Length` to be greater than zero? I think it is more reasonable to report *invalid format* rather than assert the value, as it can be provided by the user.
I think what causes the misunderstanding is the definition of //consume// in the context of `StringRef`.
```lang=C++
const StringRef Copy = Line;
Line.consumeInteger(...); // Line advances forward by the number of characters that were parsed as an integral value.
// Copy still refers to the unmodified, original characters.
// I can use it however I want.

// `Line` is a suffix of `Copy`, and the `.end()` should be the same, only `.begin()` should differ.
```

I hope that caused the miscommunication.

---
> BTW, is it necessary to assert the USR-Length to be greater than zero? I think it is more reasonable to report *invalid format* rather than assert the value, as it can be provided by the user.
Yeah, sure!

================
Comment at: clang/test/Analysis/Inputs/ctu-lookup-name-with-space.cpp:12-14
+int importee(int X) {
+  return 1 / X;
+}
----------------
Also fixup the return type in the declaration within the main TU. Also add the `// expected-no-diagnostics` comment to the primary TU.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102669/new/

https://reviews.llvm.org/D102669