[PATCH] D125226: [pseudo] Add benchmarks for pseudoparser.

Mon May 9 07:16:16 PDT 2022

sammccall accepted this revision.
sammccall added a comment.
This revision is now accepted and ready to land.

Nice! This will be very useful. I think we should have some fixed checked-in examples, but we can add them later.

BTW, the fuzzer identifies slow inputs, I have attached one (though it looks like `x::x::x::x::...` might be enough. F23013421: slow-unit-f2ba0885392b8fd32237e004e9c3be20175a1f86 <https://reviews.llvm.org/F23013421>

================
Comment at: clang-tools-extra/pseudo/benchmarks/Benchmark.cpp:7
+//
+//===----------------------------------------------------------------------===//
+
----------------
Can you add a brief description and usage note here? (realistic, not BENCHMARK_OPTIONS)
In particular something useful for benchmarking the parser overall, which will be more important long-term than specific operations like grammar compilation.

(It'd be nice if we could make overall parsing the *default* benchmark action, but I'm not sure if that's easy)

================
Comment at: clang-tools-extra/pseudo/benchmarks/Benchmark.cpp:40
+}
+std::unique_ptr<Grammar> buildGrammar(llvm::StringRef GrammarText) {
+  std::vector<std::string> Diags;
----------------
I'd avoid helper functions for the content of the benchmark loop, because it makes it less obvious exactly what you're benchmarking.

in particular in this case, I think Diags should be written outside the loop (it shouldn't matter in practice if there are no diags, but it's subtle)

================
Comment at: clang-tools-extra/pseudo/benchmarks/Benchmark.cpp:49
+static void runParseGrammar(benchmark::State &State) {
+  std::string GrammarText = readFile(GrammarFile);
+  for (auto _ : State)
----------------
rather than have each benchmark read the inputs and compile the grammar, it seems a bit less repetitive to do this in main() or a single setup function called from there, and share the const inputs.

The only real downside I see is that some of the benchmarks don't require source, but source text could default to an empty string.

================
Comment at: clang-tools-extra/pseudo/benchmarks/Benchmark.cpp:55
+
+static void runLRBuild(benchmark::State &State) {
+  std::string GrammarText = readFile(GrammarFile);
----------------
runBuildLSR?

================
Comment at: clang-tools-extra/pseudo/benchmarks/Benchmark.cpp:64
+
+static void runGLRParse(benchmark::State &State) {
+  std::string GrammarText = readFile(GrammarFile);
----------------
if the benchmark is "runGLRParse" then everything apart from the glrParse call should be outside the test loop I think. (and such a benchmark is useful)

If the benchmark is intended to be end-to-end (also useful) then it should have a more generic name, and we should expect to include disambiguation, syntax-tree forming etc in the future.

I think we should probably have both!

================
Comment at: clang-tools-extra/pseudo/benchmarks/Benchmark.cpp:76
+    TokenStream Preprocessed =
+        DirectiveTree::parse(RawStream).stripDirectives(RawStream);
+    TokenStream ParseableStream =
----------------
you need to call chooseConditionalBranches before stripping, otherwise no branches will be taken which is not realistic for real code

================
Comment at: clang-tools-extra/pseudo/benchmarks/Benchmark.cpp:98
+
+  clang::pseudo::GrammarFile = argv[1];
+  clang::pseudo::SourceFile = argv[2];
----------------
instead of manipulating argv by hand, maybe first call benchmark::initialize and then call llvm::cl::ParseCommandLineOptions?

See llvm/utils/unittest/UnitTestMain/TestMain.cpp

(Hmm, maybe I should have looked into this for the fuzzer)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125226/new/

https://reviews.llvm.org/D125226