[PATCH] D142092: [include-mapping] Allow multiple headers for the same symbol. Choose the first header of available ones.

Viktoriia Bakalova via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Jan 26 02:46:19 PST 2023


VitaNuo marked 2 inline comments as done.
VitaNuo added inline comments.


================
Comment at: clang/include/clang/Tooling/Inclusions/StdSymbolMap.inc:106
-SYMBOL(atomic_exchange_explicit, std::, <atomic>)
-SYMBOL(atomic_fetch_add, std::, <atomic>)
-SYMBOL(atomic_fetch_add_explicit, std::, <atomic>)
----------------
hokein wrote:
> Looks like the regex filters too many symbols -- e.g. atomic_fetch_add is not a variant symbol, we should keep it instead. The same to other following symbols as well.
They're all back now.


================
Comment at: clang/include/clang/Tooling/Inclusions/StdSymbolMap.inc:100
 SYMBOL(atoll, std::, <cstdlib>)
+SYMBOL(atomic, std::, <atomic>)
+SYMBOL(atomic, std::, <memory>)
----------------
kadircet wrote:
> hokein wrote:
> > VitaNuo wrote:
> > > VitaNuo wrote:
> > > > kadircet wrote:
> > > > > hokein wrote:
> > > > > > Conceptually, this (and other `atomic_*` symbols) doesn't feel correct:
> > > > > > - `<atomic>` provides the generic template `template<class T> struct atomic;`
> > > > > > -  `<memory>` provides partial template specializations for `std::shared_ptr` and `std::weak_ptr`  
> > > > > > 
> > > > > > They are variant symbols (ideally, they should be treat as the `std::move()`). The issue here is that unlike `std::move` which has two individual entries in the index page, we only have one entry for `std::atomic` (extending the cppreference_parser.py script to perfectly distinguish these two cases in the [page](https://en.cppreference.com/w/cpp/atomic/atomic) seems non-trivial).  Some options:
> > > > > > 
> > > > > > 1) treat them as multiple-header symbols (like what this patch does now)
> > > > > > 2) special-case these symbols like `std::move()`
> > > > > > 3) always prefer the header providing generic templates
> > > > > > 
> > > > > > @kadircet, what do you think?
> > > > > right, i believe this patch is definitely adding keeping multiple headers for a symbol around, but mixing it with keeping headers for variants (e.g. overloads provided by different headers, or specializations as mentioned here).
> > > > > 
> > > > > we definitely need some more changes in parser to make sure we're only preserving alternatives for **the same symbol** and not for any other variants (overloads, specializations). IIRC there are currently 2 places these variants could come into play:
> > > > > - first is the symbol index page itself, symbols that have ` (foo)` next to them have variants and should still be ignored (e.g. std::remove variant of cstdio shouldn't be preserved in the scope of this patch)
> > > > > - second is inside the detail pages for symbols, e.g. [std::atomic](http://go/std::atomic), we can have headers for different declarations, they're clearly different variants so we shouldn't add such symbols into the mapping `_ParseSymbolPage` already does some detection of declarations in between headers.
> > > > > 
> > > > > in the scope of this patch, we should keep ignoring both.
> > > > I suggest to special-case the overloads for now, just not to solve all the problems in the same patch.
> > > The first group are already ignored. We need to take a lot at how to ignore the second one.
> > Ideally, we should tweak the `_ParseSymbolPage` to handle this case, but I don't see a easy way to do it (the `atomic` [case](https://en.cppreference.com/w/cpp/atomic/atomic) is quite tricky where the symbol name is identical, only the template argument is different, and the simple text-match heuristic in `_ParseSymbolPage` fails in this case).
> > 
> > so specialcasing these (there are not too many of them) looks fine to me.
> I don't follow why we can't perform this detection directly, haven't been taking a look at the details but the page looks like:
> ```
> Defined in header <atomic>
> template< class T > struct atomic;
> template< class U > struct atomic<U*>;
> Defined in header <memory>
> template< class U > struct atomic<std::shared_ptr<U>>;
> template< class U > struct atomic<std::weak_ptr<U>>;
> Defined in header <stdatomic.h>
> #define _Atomic(T) /* see below */
> ```
> 
> So `_ParseSymbolPage`, first sees <atomic> then a bunch of decls, then <memory> then a bunch more decls and so on, and in the end it returns all the headers it has found.
> 
> Why can't we change the logic here to return `nothing` when there are different decls && headers? i.e. return empty if we see a new header after seeing some declarations.
> does this result in undesired behaviour elsewhere?
Ok, I've fixed (or hacked :) this programmatically.
The parsing of constants in https://en.cppreference.com/w/cpp/locale/codecvt_mode is not complete now, everything is attributed to the "codecvt" header, the "locale" header is ignored. Hope it's OK.
If not, then I'm afraid I can only special-case the "atomic.*" symbols.


================
Comment at: clang/include/clang/Tooling/Inclusions/StdSymbolMap.inc:369
+SYMBOL(div, std::, <cinttypes>)
+SYMBOL(div, std::, <cstdlib>)
 SYMBOL(div_t, std::, <cstdlib>)
----------------
VitaNuo wrote:
> hokein wrote:
> > this one as well, both headers provide different overloads.
> I suggest to special-case the overloads for now, just not to solve all the problems in the same patch.
Should be fixed now, together with "atomic.*" cases.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142092/new/

https://reviews.llvm.org/D142092



More information about the cfe-commits mailing list