[llvm] [NFC][TableGen] Emit more readable builtin string table. (PR #105445)

Wed Aug 21 20:49:40 PDT 2024

================
@@ -637,15 +636,17 @@ void IntrinsicEmitter::EmitIntrinsicToBuiltinMap(
 
   // Populate the string table with the names of all the builtins after
   // removing this common prefix.
-  StringToOffsetTable Table;
+  SequenceToOffsetTable<StringRef> Table;
----------------
jurahul wrote:

I was finally able to run the experiments and as you suspected, SequenceToOffset table is slower. I did 2 experiments:

1. I measured the time spent in `EmitIntrinsicToBuiltinMap` when IsClang = true (as that's where we handle the most amount of intrinsics) for both cases. For `SequenceToOffset` the min, max, avg = 87, 130, 95.57, and for `StringToOffsetTable` is min, max, avg : 88, 102, 92.28. So its slightly faster. As this is still not the e2e runtime for llvm-tblgen -gen-intrinsic-impl (which is the only option that exercises this code). So the e2e impact will be smaller. The 88 here is 8.8 ms, so we get a a slowdown of 9.557 - 9.228 - 0.3 ms out of total run time of ~0.2s execution time, so its a 0.15% slowdown. Since this command line is executed just once during the entire LLVM build, I'd say the compile time impact is negligible.

2. I setup a microbenchmark in the  `EmitIntrinsicToBuiltinMap` (see code below) and in that I see that SequenceToOffset  is about 3.4x slower that StringToOffsetTable.

Benchmark code (added to EmitIntrinsicToBuiltinMap):

```C++
  // Setup a specific benchmark
  RecordKeeper *r = const_cast<RecordKeeper *>(&Records);

#define N 100
size_t offset = 0;

if (IsClang) {
  r->startTimer("SequenceToOffsetTable");
  for (int i = 0; i < N; i++) {
      SequenceToOffsetTable<StringRef> Table;
      for (const auto &[TargetPrefix, Entry] : BuiltinMap) {
        auto &[Map, CommonPrefix] = Entry;
        for (auto &[BuiltinName, EnumName] : Map) {
          StringRef Suffix = BuiltinName.substr(CommonPrefix->size());
          Table.add(Suffix);
        }
      }
      Table.layout();
      for (const auto &[TargetPrefix, Entry] : BuiltinMap) {
        auto &[Map, CommonPrefix] = Entry;
        for (auto &[BuiltinName, EnumName] : Map) {
          StringRef Suffix = BuiltinName.substr(CommonPrefix->size());
          offset += Table.get(Suffix);
        }
      }
  }
  r->stopTimer();
}

if (IsClang) {
  r->startTimer("StringToOffsetTable");
  for (int i = 0; i < N; i++) {
    StringToOffsetTable Table;
    for (const auto &[TargetPrefix, Entry] : BuiltinMap) {
      auto &[Map, CommonPrefix] = Entry;
      for (auto &[BuiltinName, EnumName] : Map) {
        StringRef Suffix = BuiltinName.substr(CommonPrefix->size());
        Table.GetOrAddStringOffset(Suffix);
      }
    }
    for (const auto &[TargetPrefix, Entry] : BuiltinMap) {
      auto &[Map, CommonPrefix] = Entry;
      for (auto &[BuiltinName, EnumName] : Map) {
        StringRef Suffix = BuiltinName.substr(CommonPrefix->size());
        offset += Table.GetOrAddStringOffset(Suffix);
      }
    }

  }
  r->stopTimer();
}
errs() << offset << "\n";
```

So in the targeted benchmark, the SequenceToOffsetTable is 3.4x slower, but in e2e tests, its a 0.15% slowdown in that one particular `llvm-tblgen` command line that is executed just once, so much less on the total build time.


https://github.com/llvm/llvm-project/pull/105445