[PATCH] D142092: [include-mapping] Allow multiple headers for the same symbol. Choose the first header of available ones.

Viktoriia Bakalova via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Jan 26 05:40:57 PST 2023


VitaNuo added a comment.

Thank you all for comments! The patch should be ready for the next review now.



================
Comment at: clang/include/clang/Tooling/Inclusions/StdSymbolMap.inc:1053
 SYMBOL(remainder, std::, <cmath>)
+SYMBOL(remove, std::, <cstdio>)
 SYMBOL(remove_all_extents, std::, <type_traits>)
----------------
VitaNuo wrote:
> hokein wrote:
> > I think `std::remove` should not be in the scope of the patch, it has two variants:
> > - std::remove from `<cstdio>`
> > - and std::remove from `<algorithm>`.
> Yes, agreed. Let's take care of the overloads going forward.
The latest version keeps the `std::remove` overload from `<algorithm>`.


================
Comment at: clang/lib/Tooling/Inclusions/Stdlib/StandardLibrary.cpp:58
+    int SymIndex = NextAvailSymIndex;
+    if (NSSymbolMap *NSSymbols = NamespaceSymbols->lookup(NS)) {
+      auto It = NSSymbols->find(Name);
----------------
hokein wrote:
> VitaNuo wrote:
> > hokein wrote:
> > > Given the fact that multiple-header symbols are grouped in the .inc file, we could simplify the code of checking a new symbol by looking at the last available SymIndex:
> > > 
> > > ```
> > > if (NextAvailSymIndex > 0 && SymbolNames[NextAvailSymIndex-1].first == NS && SymbolNames[NextAvailSymIndex-1].second == Name) {
> > >    // this is not a new symbol.
> > > }
> > > ```
> > I know this is easier in terms of code, but this heavily relies on the order in the generated files. I would prefer to keep the library and the generator as decoupled as possible, even if it means slightly more complex code here. Overall, it seems more future-proof in case of unrelated generator changes (bugs?) that might change the order.
> > but this heavily relies on the order in the generated files.
> 
> Yeah, this is true. I actually think we should make it as an invariant (multiple-header symbols are grouped) of the generated .inc files. This invariant is important and useful, it is much easier for human to read and justify. We can probably guarantee it in the generator side.
Ok, sure. Sorted the symbols in the generator now and also applied the snippet from above. I'm not 100% sure the code became much simpler but this version seems fine too :)


================
Comment at: clang/tools/include-mapping/cppreference_parser.py:174
-      # std::remove<> has variant algorithm.
-      "std::remove": ("algorithm"),
-  }
----------------
kadircet wrote:
> VitaNuo wrote:
> > VitaNuo wrote:
> > > kadircet wrote:
> > > > VitaNuo wrote:
> > > > > kadircet wrote:
> > > > > > this is actually checking for something else (sorry for the confusing naming).
> > > > > > 
> > > > > > the `variant` here refers to library name mentioned in parentheses (this is same problem as `std::move`) on the std symbol index page https://en.cppreference.com/w/cpp/symbol_index (e.g. `remove<>() (algorithm)`). by getting rid of this we're introducing a regression, as previously `std::remove` wouldn't be recognized by the library, but now it'll be recognized and we'll keep suggesting `<cstdio>` for it.
> > > > > > 
> > > > > > so we should actually keep this around.
> > > > > Ok, I can keep this out of this patch, but we'll have to remove this logic evetually when we deal with overloads.
> > > > > 
> > > > > I have a slight suspicion that this code might be buggy, because it suggests that one _of_ the variants should be accepted. What is does in reality, though, is it keeps `algorithm` in the list of headers suitable for `std::remove` alongside `cstdio`, and then in the last step `std::remove` is ignored by the generator because of being defined in two headers.
> > > > > 
> > > > > With this patch, the result will be both `{cstdio, algorithm}`. Is this (more) satisfactory for now compared to skipping `algorithm` due to being an overload?
> > > > > 
> > > > > Ok, I can keep this out of this patch, but we'll have to remove this logic evetually when we deal with overloads.
> > > > 
> > > > Surely, I wasn't saying this should stay here forever, i am just saying that what's done in the scope of this patch doesn't really address the issues "worked around" by this piece.
> > > > 
> > > > > I have a slight suspicion that this code might be buggy, because it suggests that one _of_ the variants should be accepted. What is does in reality, though, is it keeps algorithm in the list of headers suitable for std::remove alongside cstdio, and then in the last step std::remove is ignored by the generator because of being defined in two headers.
> > > > 
> > > > right, it's because we have logic to prefer "non-variant" versions of symbols when available (i.e. in the absence of this logic, we'd prefer std::remove from cstdio). this logic enables us to preserve certain variants (in addition to non-variants). that way we treat std::remove as ambigious rather than always resolving to <cstdio>, hence it's marked as "missing", similar to `std::move`.
> > > > 
> > > > > With this patch, the result will be both {cstdio, algorithm}. Is this (more) satisfactory for now compared to skipping algorithm due to being an overload?
> > > > 
> > > > in the end this should probably look like {algorithm, cstdio}, but as mentioned elsewhere, this is not the same problem as "same symbol being provided by multiple header" but rather "different symbols having same name and different headers". so treatment of it shouldn't change by this patch.
> > > On second thought, I think we'd rather special-case the overloads for now (until we get to them). See the newest patch version.
> > > right, it's because we have logic to prefer "non-variant" versions of symbols when available (i.e. in the absence of this logic, we'd prefer std::remove from cstdio).
> > 
> > Where is this logic? AFAICS the generator in the current state doesn't generate anything for std::remove. 
> > Where is this logic? AFAICS the generator in the current state doesn't generate anything for std::remove.
> 
> It's the logic that you're deleting:
> ```
>     if variant and variant not in variants_for_symbol:
>         continue
> ```
> 
> we ignore any symbols that has a variant we shouldn't accept and always prefer the non-variant versions. to be more concrete when parsing c++ symbol index we'll see two alternatives for `std::remove`:
> 
> ```
> remove()
> remove<>() (algorithm)
> ```
> 
> first one is a non-variant, hence it's accepted by default. second one is `algorithm` variant, and is accepted per this deleted logic. in the end we **used** to didn't generate anything because we now have multiple headers providing `std::remove`.
> 
> i think it creates more confusion to special case only std::remove down below, and not other symbols whose non-variant versions we accept, e.g. sin also has non-variant versions, but we don't "special case" it to keep it out of the map.
> 
> i'd suggest treating std::remove similar to other symbols with an accepted variant, rather than creating a divergence. so my suggestion is changing the logic above to **only** accept variants mentioned in the map when there's a mapping for the symbol (and reject the non-variant version). net effect will be that we'll now include the mapping from `std::remove` to `<algorithm>`, which is currently inline with our behaviour against other symbols with different variants. we try to pick the most common variant, and that's almost always the "non-variant" version, apart from `std::remove`.
> 
> that way we should get rid of the special casing below completely. does that make sense?
Ok, I've tweaked the logic to keep `std::remove` from `<algorithm>` now.


================
Comment at: clang/tools/include-mapping/gen_std.py:112
   for symbol in symbols:
-    if len(symbol.headers) == 1:
-      # SYMBOL(unqualified_name, namespace, header)
-      print("SYMBOL(%s, %s, %s)" % (symbol.name, symbol.namespace,
-                                    symbol.headers[0]))
-    elif len(symbol.headers) == 0:
+    if re.match("^remove$|^swap$", symbol.name) and symbol.namespace == "std::":
+      continue
----------------
kadircet wrote:
> different headers providing `std::swap` don't provide different variants of it, these are just ADL extension points, similar to std::begin (which also shouldn't be missing from the mapping).
> 
> for `remove`, i've provided more context in the discussion above, about keeping a different variant of `std::remove` around.
Thank you. The latest version should take care both of the overloads that have accidentally made it to this patch and of the `std::remove` case. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142092/new/

https://reviews.llvm.org/D142092



More information about the cfe-commits mailing list