[PATCH] D143640: [Tooling/Inclusion] Support overload symbols in the stdlib.

Mon Feb 13 05:08:50 PST 2023

kadircet added a comment.

In D143640#4122603 <https://reviews.llvm.org/D143640#4122603>, @hokein wrote:

> Thanks for the comment, putting more thoughts.
>
> In D143640#4121998 <https://reviews.llvm.org/D143640#4121998>, @kadircet wrote:
>
>> All this complexity for handling only 4 symbols feels wrong. is this the whole set? are there any other symbols that fall under this class? can we disambiguate all of them solely based on number of parameters?
>
> No, these 4 symbols is not the whole set, there are other symbols (e.g. `isspace` from `<cctype>`, or `<locale>`), but these are not important, IMO.

I agree them being not important. But I think having such a complex infrastructure for just 4 symbols and making it more complex in the future when we want to handle these just sounds like a big negative.

> The number of parameters is the most critical information to disambiguate -- there are some exceptions, e.g. `std::get()` has a few 1-parameter overloads for array, vector, etc. We will add more information in the `SymbolName` structure for disambiguation in the future.

Makes sense. but maybe there's a way that involves not relying on parameter count at all and something completely different. that's another reason why I was asking for other symbols that are causing problems today.

>> Downsides:
>>
>> - We are relying heavily on number of parameters in certain cases now
>
> Yeah, I think this is expected if we want to do disambiguations.
>
>> - We only have that information available for symbols with variants
>
> My intention of this information is internal, and merely used for disambiguation. And we're not exposing them to public APIs.

I can see that purpose, but having those symbols available internally without any way for users to poke at them sounds like a shortcoming of the design that'll never address. Hence I was trying to avoid that.

>> - We're creating a discrepancy in Symbol::all() vs Symbol::named() views of the world
>
> Oh, this is not intended, we can remove this discrepancy, and make `Symbol::all()` and `Symbol::named()` aligned.

I don't think removing that discrepancy by also dropping these symbols from ::all is the right trade-off though. As they're still available as a `Symbol` through Recognizer. I think the more appropriate thing would require us to change `::named` to return multiple symbols, with enough details for disambiguation if needed.

>> - In addition to generator and hand curated list, now there's a 3rd place to augment symbol mapping
>
> Right, this bit is untweaked in this current patch (there is a FIXME), we can merge it with the hand-curated list.
>
>> This complexity both in terms of mental load when introducing new symbols, and also possible discrepancies that can be encountered when using the public APIs makes me question if we're doing the sensible thing that'll generalize for the future variant symbols, or just trying to patch up a solution for couple cases we know to be not working.
>> I'd suggest:
>> 1- doing a little bit more design work here and at least getting rid of the burden on the public APIs and a way to introduce these symbols through hand curated lists (e.g. we should be returning all alternatives in Symbol::named, with enough details for the caller to have a chance to disambiguate and have maybe new syntax in raw symbol map files to indicate this variant situation).
>
> OK, looks like we have some different ideas on the design here. I'm not sure returning all alternatives in `Symbol::named` and exposing all information for disambiguation is a good idea (I'm also not sure it is an important usecase). This seems complicating the public APIs too much.

I think we agree on them being not important so far. But I don't see why having a discrepancy between `Symbol`s received through `Recognizer` and `Symbol::named` is preferred instead of having a unified view between the two (which we already have right now, by excluding variants from both).
I was suggesting preserving this invariant because it's easier to reason about the library when we only have a single view rather than 2. Could you elaborate a little on why you think "exposing all information for disambiguation" is not a good idea? The public interfaces stays ~the same. We'll be having these extra information in `Symbol`, if we can find a meaningful set.

> My original design is
>
> - full disambiguation is *only* performed in the `Recognizer` API where the input is the AST node
> - other qualified-name-based APIs `Symbol::named` remains the same (or best effort returning the symbol with most common header)
>
> I share your concern about the current approach, it is not perfect, but I think this is moving towards a right direction, the tradeoff seems OK to me (rather than putting the special-case handling to *every* application, we isolate it in one place).

I agree that sharing this logic with applications is definitely a plus (and is the main goal of this library). I am mostly concerned of current approach because it's adding too much complexity for handling of 2 special symbols, that might not generalise to others in the future and we can special case these symbols on the applications with only couple lines of code + this special casing still needs to exist in applications that uses Symbol::named interfaces. Therefore this discrepancy might create bugs in the future, just because people are used to behaviour in one side and expect the same on the other.

>> 2- or handling this in include-cleaner only, similar to how we do it in clangd and other applications, until we encounter more problematic symbols. as handling this in the general case seems hard (especially in the absence of an AST), and we haven't seen enough signals for symbols other than `std::move` to cause trouble.
>>
>> i feel like 2nd alternative gets us a long way with minimal effort. WDYT?
>
> The 2nd alternative definitely sounds good to me. Before we settle down the design, I will prepare a patch to special-case these symbols in include-cleaner.

Thanks a lot!

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143640/new/

https://reviews.llvm.org/D143640