[clang] [Clang] Improve EmitClangAttrSpellingListIndex (PR #114899)

Tue Nov 5 06:17:05 PST 2024

================
@@ -3841,19 +3842,110 @@ void EmitClangAttrSpellingListIndex(const RecordKeeper &Records,
     const Record &R = *I.second;
     std::vector<FlattenedSpelling> Spellings = GetFlattenedSpellings(R);
     OS << "  case AT_" << I.first << ": {\n";
-    for (unsigned I = 0; I < Spellings.size(); ++ I) {
-      OS << "    if (Name == \"" << Spellings[I].name() << "\" && "
-         << "getSyntax() == AttributeCommonInfo::AS_" << Spellings[I].variety()
-         << " && Scope == \"" << Spellings[I].nameSpace() << "\")\n"
-         << "        return " << I << ";\n";
+
+    // If there are none or one spelling to check, resort to the default
+    // behavior of returning index as 0.
+    if (Spellings.size() <= 1) {
+      OS << "    return 0;\n"
+         << "    break;\n"
+         << "  }\n";
+      continue;
     }
 
-    OS << "    break;\n";
-    OS << "  }\n";
+    bool HasSingleUniqueSpellingName = true;
+    StringMap<std::vector<const FlattenedSpelling *>> SpellingMap;
+
+    StringRef FirstName = Spellings.front().name();
----------------
erichkeane wrote:

This does a ton of work in a way that is pretty difficult.  Really what you're trying to find is that all of the names are the same, plus information about the names.  I think something like:

```
std::vector<StringRef> Names;
llvm::transform(Spellings, std::back_inserter(Names), [](const FlattenedSpelling &FS) { return FS.name(); });
llvm::sort(Names);
llvm::erase(llvm::unique(Names), Names.end());
```

Would give you a lot of information that would be necessary/useful later.

First, you can check:
`Names.size() ==1` <= Means that all of the names are the same name, so you can skip the name check entirely.

`Names.end() == std::adjacent_find(Names.begin(), Names.end(), [](StringRef LHS, StringRef RHS){ return LHS.size() == RHS.size(); });` 
If THAT is true, all you have to do is check 'size', since all of  them have a unique length.

THOUGH, you probably want to do this on a 'size' of name basis, so if you do that in the loop of the current `FlattenedSpelling` (and add to the condition in the `adjacent_find` that the size is the current `FS.name().size()`), if you don't find one, you can do just the size check.

Alternatively, you can do just a `copy_if` from that based on the current `FlattenedSpelling` name's size, which would give you the bits to tell which actual characters to check.  So something like:

```
for (const FlattenedSpelling &FS : Spellings) {
  std::vector<StringRef> SameLenNames;
  llvm::copy_if(Names, std::back_inserter(SameLenNames), [&](StringRef N) { return N.size() == FS.name().size(); });

  // insert print size check here
  -- Since Names.size() > 1 above, we actually have to check the sizes.  THOUGH, IMO, you can be a little smarter about combining it into here too, and only doing the copy_if that is necessary (so we only do the loop on Spellings once).

  if (SameLenNames.size() > 1) {
     // if size > 1 we have to check individual characters, else we could just do size + scope/etc
     // print with just the size check.
     for(StringRef SLN : SameLenNames) {
        if (SLN == FS.name()) continue; // don't check same name!
        auto [CurItr, OtherItr] = std::mismatch(FS.name().begin(), FS.name().end(), SLN.begin());

        // NOW you know which character to check to make sure we're unique.  so something like:
        OS << "Name[" << std::distance(FS.name().begin(), CurItr) << "] == '" << *CurItr << "'';
     }
  }

}
```
THOUGH, like i said, you can probably combine the `Names.size() ==1` condition inside of that loop as well.  And probably the `adjacent_find` test can be skipped with the loop I just wrote above.


https://github.com/llvm/llvm-project/pull/114899