[clang] [lldb] [llvm] [mlir] [Support] Validate number of arguments passed to formatv() (PR #105745)

Tue Aug 27 16:19:50 PDT 2024

jurahul wrote:

I did 2 sets of experiments, but data wise I am inconclusive if this causes a real compile time regression.

1. Build MLIR verbose and capture all mlir-gen command lines to a file:
   ninja -C build check-mlir --verbose | tee build_log.txt
   grep "NATIVE/bin/mlir-tblgen " build_log.txt  | cut -d ' ' -f 2- > mlir-tablegen-commands.txt

2. Build both baseline and new versions of LLVM/MLIR in 2 different paths "upstream_clean" and "upstream_llvm"

3. Use attached script to run these captured commands with --time-phases and measure total time.

4. Establish baseline variance, by running the script comparing baseline to itself.

	Total time                                         4.2302     4.2573     0.6406

   So baseline variance is 0.6%, with each command running 20 times. Note that for individual targets,
   the variance is quite high for some of them, upto 100%.

5. Establish "new" variance, by running script to compare new to itself
  Total time                                         4.2829     4.2531     -0.6958
  Again, 0.6% variance.

6. Run baseline against new:
  Total time                                         4.1745     4.2864     2.6806
  So this seems to give 2.6% regression. However, the individual data is quite noisy. For example, for individual samples,
  the variance can be quite high, upto 100%.

7. Add a FormatVariadic benchmark to test format() with 1-5 substitutions (which covers the common usage in LLVM), and run baseline and new:
   ./build/benchmarks/FormatVariadic --benchmark_repetitions=20

Baseline:
BM_FormatVariadic_mean         1063 ns         1063 ns           20
New:
BM_FormatVariadic_mean         1097 ns         1097 ns           20

This is ~3.2% regression in just formatv.  

The benchmark I added was:

```C++
#include "benchmark/benchmark.h"
#include "llvm/Support/FormatVariadic.h"

using namespace llvm;

// Benchmark intrinsic lookup from a variety of targets.
static void BM_FormatVariadic(benchmark::State &state) {
  for (auto _ : state) {
    // Exercise formatv() with several valid replacement options.
    formatv("{0}", 1).str();
    formatv("{0}{1}", 1, 1).str();
    formatv("{0}{1}{2}", 1, 1, 1).str();
    formatv("{0}{1}{2}{3}", 1, 1, 1, 1).str();
    formatv("{0}{1}{2}{3}{4}", 1, 1, 1, 1, 1).str();

  }
}

BENCHMARK(BM_FormatVariadic);

BENCHMARK_MAIN();
```

The compile time data collected from mlir-tblgen runs is quite noisy for individual targets, though the aggregated results seem stable, but I wonder if that means that its not really capturing small compile time delta correctly. As an example:

```
lir/Dialect/MemRef/IR/MemRefOps.cpp.inc  0.0106     0.0119     12.2642%
mlir/include/mlir/IR/BuiltinOps.cpp.inc  0.0048     0.0042     -12.5000%
```

So within the same run, for one  target its +12% and for another its -12%.

The other line of thinking is that this validation is an aid to developers, so enabling it just in Debug builds may be good enough to catch issues. I am attaching the script and the capture mlit-tblgen commands used in the script below

[mlir-tablegen-commands.txt](https://github.com/user-attachments/files/16770614/mlir-tablegen-commands.txt)
[ct_formatv.txt](https://github.com/user-attachments/files/16770618/ct_formatv.txt)

https://github.com/llvm/llvm-project/pull/105745