[PATCH] D97869: [OpenCL][Draft] Add OpenCL builtin test generator

Anastasia Stulova via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Mar 24 13:33:00 PDT 2021


Anastasia added a comment.

I have done some measurements using the test produced from this Tablegen emitter (59K lines).

I have used the test it in two files:

1. `SemaOpenCL/all-std-buitins.cl` that has the following RUN line appended 6 times (for every supported OpenCL version v1.0, v1.1, v1.2, v2.0, v1.3, C++)

  //RUN: %clang_cc1 %s -triple=spir -fsyntax-only -verify -cl-std=CL2.0 -finclude-default-header -fdeclare-opencl-builtins



2. `SemaOpenCL/all-std-buitins-slow-header.cl` that has the following RUN line appended 6 times (for every supported OpenCL version v1.0, v1.1, v1.2, v2.0, v3.0, C++)

  //RUN: %clang_cc1 %s -triple=spir -fsyntax-only -verify -cl-std=CL2.0 -finclude-default-header

So I am getting the following testing time breakdown then:

  201.61s: Clang :: SemaOpenCL/all-std-buitins-slow-header.cl
  199.70s: Clang :: SemaOpenCL/all-std-buitins.cl
  85.14s: Clang :: Headers/arm-neon-header.c
  68.06s: Clang :: OpenMP/nesting_of_regions.cpp
  65.23s: Clang :: Driver/crash-report.c
  60.26s: Clang :: Analysis/PR24184.cpp
  57.80s: Clang :: CodeGen/X86/rot-intrinsics.c
  57.58s: Clang :: CodeGen/X86/x86_64-xsave.c
  56.34s: Clang :: Headers/opencl-c-header.cl
  55.68s: Clang :: CodeGen/X86/x86_32-xsave.c
  44.83s: Clang :: Driver/crash-report-with-asserts.c
  40.38s: Clang :: Lexer/SourceLocationsOverflow.c
  37.44s: Clang :: Headers/x86intrin-2.c
  36.53s: Clang :: OpenMP/target_teams_distribute_parallel_for_simd_codegen_registration.cpp
  34.09s: Clang :: CodeGen/X86/avx512f-builtins-constrained.c
  33.41s: Clang :: CodeGen/X86/sse-builtins-constrained.c
  32.82s: Clang :: Analysis/iterator-modeling.cpp
  31.37s: Clang :: OpenMP/target_teams_distribute_simd_codegen_registration.cpp
  31.10s: Clang :: OpenMP/target_parallel_for_simd_codegen_registration.cpp
  30.78s: Clang :: Analysis/use-after-move.cpp

I am very confused though about why is the difference between Tablegen and `opencl-c.h` so insignificant? FYI, also for a single clang invocation with Tablegen and `opencl-c.h` the difference is very insignificant in parsing time of this test - 20.794s vs 21.401s. This is really interesting because with small files the difference is huge 0.043s vs 3.990s on test with empty kernel.

---------------------------------------

I also timed `check-clang` invocation on my 8 core machine:

1. with both tests  - 697.70s
2. with all-std-buitins.cl only  - 684.43s
3. without any new tests  - 673.00s

The change in total testing time appears to be insignificant. I guess this is due to parallel execution?
Btw one thing I have thought of since OpenCL v1.0-1.1 doesn't differ a lot for builtin functions and they are not modified much either, perhaps we only need to test v1.2? That would reduce number of clang invocations to 4 in each test. Then the measurements are as follows:

  134.13s: Clang :: SemaOpenCL/all-std-buitins-slow-header.cl
  131.52s: Clang :: SemaOpenCL/all-std-buitins.cl
  85.81s: Clang :: Headers/arm-neon-header.c
  69.14s: Clang :: OpenMP/nesting_of_regions.cpp
  60.08s: Clang :: Driver/crash-report.c
  59.67s: Clang :: Analysis/PR24184.cpp
  57.27s: Clang :: CodeGen/X86/rot-intrinsics.c
  56.93s: Clang :: CodeGen/X86/x86_32-xsave.c
  56.59s: Clang :: CodeGen/X86/x86_64-xsave.c
  55.68s: Clang :: Headers/opencl-c-header.cl
  40.71s: Clang :: Driver/crash-report-with-asserts.c
  39.44s: Clang :: Lexer/SourceLocationsOverflow.c
  38.02s: Clang :: OpenMP/target_teams_distribute_parallel_for_simd_codegen_registration.cpp
  37.07s: Clang :: Headers/x86intrin-2.c
  32.61s: Clang :: CodeGen/X86/avx512f-builtins-constrained.c
  32.58s: Clang :: CodeGen/X86/sse-builtins-constrained.c
  32.19s: Clang :: Analysis/use-after-move.cpp
  31.96s: Clang :: Analysis/iterator-modeling.cpp
  31.02s: Clang :: OpenMP/target_teams_distribute_simd_codegen_registration.cpp
  30.59s: Clang :: OpenMP/target_parallel_for_simd_codegen_registration.cpp

with a total testing time 688.61s

**Conclusion:**

- if we test the whole functionality the test will be at least 2x slower than the slowest clang test so far but it hardly affect the full testing time of clang-check on modern HW due to the parallel execution. Also related to this partitioning of test files could help with the latency due to the parallel execution.
- Testing of opencl-c.h only doubles the testing time.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97869/new/

https://reviews.llvm.org/D97869



More information about the cfe-commits mailing list