[llvm] [LLVM][TableGen] Add overloaded intrinsic name conflict checks (PR #109314)

Rahul Joshi via llvm-commits llvm-commits at lists.llvm.org
Thu Sep 19 10:17:36 PDT 2024


================
@@ -94,6 +95,168 @@ void CodeGenIntrinsicTable::CheckDuplicateIntrinsics() const {
   PrintFatalNote(First.TheDef, "Previous definition here");
 }
 
+// Note: This is a modified version of `Intrinsic::lookupLLVMIntrinsicByName`
+// in IntrinsicInst.cpp file.
+template <typename T>
+int lookupLLVMIntrinsicByName(ArrayRef<const T *> NameTable, StringRef Name,
+                              function_ref<const char *(const T *)> ToString) {
+  using ToStringTy = function_ref<const char *(const T *)>;
+  assert(Name.starts_with("llvm.") && "Unexpected intrinsic prefix");
+
+  // Do successive binary searches of the dotted name components. For
+  // "llvm.gc.experimental.statepoint.p1i8.p1i32", we will find the range of
+  // intrinsics starting with "llvm.gc", then "llvm.gc.experimental", then
+  // "llvm.gc.experimental.statepoint", and then we will stop as the range is
+  // size 1. During the search, we can skip the prefix that we already know is
+  // identical. By using strncmp we consider names with differing suffixes to
+  // be part of the equal range.
+  size_t CmpEnd = 4; // Skip the "llvm" component.
+
+  struct Compare {
+    Compare(size_t CmpStart, size_t CmpEnd, ToStringTy ToString)
+        : CmpStart(CmpStart), CmpEnd(CmpEnd), ToString(ToString) {}
+    bool operator()(const T *LHS, const char *RHS) {
+      return strncmp(ToString(LHS) + CmpStart, RHS + CmpStart,
+                     CmpEnd - CmpStart) < 0;
+    }
+    bool operator()(const char *LHS, const CodeGenIntrinsic *RHS) {
+      return strncmp(LHS + CmpStart, ToString(RHS) + CmpStart,
+                     CmpEnd - CmpStart) < 0;
+    }
+    const size_t CmpStart, CmpEnd;
+    ToStringTy ToString;
+  };
+
+  auto Low = NameTable.begin();
+  auto High = NameTable.end();
+  auto LastLow = Low;
+  while (CmpEnd < Name.size() && High - Low > 0) {
+    size_t CmpStart = CmpEnd;
+    CmpEnd = Name.find('.', CmpStart + 1);
+    CmpEnd = CmpEnd == StringRef::npos ? Name.size() : CmpEnd;
+    LastLow = Low;
+    Compare Cmp(CmpStart, CmpEnd, ToString);
+    std::tie(Low, High) = std::equal_range(Low, High, Name.data(), Cmp);
+  }
+  if (High - Low > 0)
+    LastLow = Low;
+
+  if (LastLow == NameTable.end())
+    return -1;
+  StringRef NameFound = (*LastLow)->Name;
+  if (Name == NameFound ||
+      (Name.starts_with(NameFound) && Name[NameFound.size()] == '.'))
+    return LastLow - NameTable.begin();
+  return -1;
+}
+
+// Check for conflicts with overloaded intrinsics. If an intrinsic's name has
+// the name of an overloaded intrinsic as a prefix, then it may be ambiguious
+// and confusing as to which of the two a specific call refers to. Hence
+// disallow such cases.
+//
+// As an example, if `llvm.foo` is overloaded, it will appear as
+// `llvm.foo.<mangling_suffix>` in the IR. Now, if another intrinsic is called
+// `llvm.foo.bar`, it may be ambiguious as to whether `.bar` is a mangling
+// suffix for the overloaded `llvm.foo` or if its the `llvm.foo.bar` intrinsic.
+// Note though that `llvm.foobar` is OK. So the prefix check will check if
+// there name of the overloaded intrinsic is a prefix of another one and the
+// next letter after the prefix is a `.`.
+//
+// Also note that the `.bar` suffix in the example above may not be a valid
+// mangling suffix for `llvm.foo`, so we could check that and allow
+// `llvm.foo.bar` as there is no ambiguity. But LLVM's intrinsic name matching
+// does not support this (in llvm::Intrinsic::lookupLLVMIntrinsicByName).
+// However the ambiguity is still there, so we do not allow this case (i.e.,
+// the check is overly strict).
+void CodeGenIntrinsicTable::CheckOverloadConflicts() const {
+  // Collect all overloaded intrinsic names in a vector. `Intrinsics` are
+  // already mostly sorted by name (all target independent ones first, sorted by
+  // their name, followed by target specific ones, sorted by their name).
+  // However we would like to detect cases where an overloaded target
+  // independent one shares a prefix with a target depndent one. So we need to
+  // collect and resort all of them by name.
+  std::vector<const CodeGenIntrinsic *> OverloadedIntrinsics;
+  for (const CodeGenIntrinsic &Int : Intrinsics)
+    if (Int.isOverloaded)
+      OverloadedIntrinsics.push_back(&Int);
+
+  sort(OverloadedIntrinsics,
+       [](const CodeGenIntrinsic *Int1, const CodeGenIntrinsic *Int2) {
+         return Int1->Name < Int2->Name;
+       });
+
+  ArrayRef<const CodeGenIntrinsic *> AR = OverloadedIntrinsics;
+  auto ToString = [](const CodeGenIntrinsic *Int) -> const char * {
+    return Int->Name.c_str();
+  };
+
+  for (const CodeGenIntrinsic &Int : Intrinsics) {
+    int index =
+        lookupLLVMIntrinsicByName<CodeGenIntrinsic>(AR, Int.Name, ToString);
+    if (index == -1 || AR[index] == &Int)
+      continue;
+    const CodeGenIntrinsic &Overloaded = *AR[index];
+
+    // Allow only a single ".xxx" suffix after the matched name, if we know that
+    // the it likely won't match a mangling suffix.
+    StringRef Suffix =
+        StringRef(Int.Name).drop_front(Overloaded.Name.size() + 1);
+    bool IsSuffixOk = [&]() {
+      // Allow only a single "." separated token.
+      if (Suffix.find('.') != StringRef::npos)
+        return false;
+      // Do not allow if suffix can potentially match a mangling suffix.
----------------
jurahul wrote:

2nd point of feedback: is this ok? I don't think we can have the getMangledTypeStr() code and this code adjacent to each other (as one operated on LLVM Types and others on strings and we cannot link LLVM Core into TableGen). So we will keep these separate with a comment in both places mentioning each other and asking to keep the 2 pieces in sync.

https://github.com/llvm/llvm-project/pull/109314


More information about the llvm-commits mailing list