[PATCH] D97869: [OpenCL][Draft] Add OpenCL builtin test generator
Anastasia Stulova via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Wed Mar 24 13:33:00 PDT 2021
Anastasia added a comment.
I have done some measurements using the test produced from this Tablegen emitter (59K lines).
I have used the test it in two files:
1. `SemaOpenCL/all-std-buitins.cl` that has the following RUN line appended 6 times (for every supported OpenCL version v1.0, v1.1, v1.2, v2.0, v1.3, C++)
//RUN: %clang_cc1 %s -triple=spir -fsyntax-only -verify -cl-std=CL2.0 -finclude-default-header -fdeclare-opencl-builtins
2. `SemaOpenCL/all-std-buitins-slow-header.cl` that has the following RUN line appended 6 times (for every supported OpenCL version v1.0, v1.1, v1.2, v2.0, v3.0, C++)
//RUN: %clang_cc1 %s -triple=spir -fsyntax-only -verify -cl-std=CL2.0 -finclude-default-header
So I am getting the following testing time breakdown then:
201.61s: Clang :: SemaOpenCL/all-std-buitins-slow-header.cl
199.70s: Clang :: SemaOpenCL/all-std-buitins.cl
85.14s: Clang :: Headers/arm-neon-header.c
68.06s: Clang :: OpenMP/nesting_of_regions.cpp
65.23s: Clang :: Driver/crash-report.c
60.26s: Clang :: Analysis/PR24184.cpp
57.80s: Clang :: CodeGen/X86/rot-intrinsics.c
57.58s: Clang :: CodeGen/X86/x86_64-xsave.c
56.34s: Clang :: Headers/opencl-c-header.cl
55.68s: Clang :: CodeGen/X86/x86_32-xsave.c
44.83s: Clang :: Driver/crash-report-with-asserts.c
40.38s: Clang :: Lexer/SourceLocationsOverflow.c
37.44s: Clang :: Headers/x86intrin-2.c
36.53s: Clang :: OpenMP/target_teams_distribute_parallel_for_simd_codegen_registration.cpp
34.09s: Clang :: CodeGen/X86/avx512f-builtins-constrained.c
33.41s: Clang :: CodeGen/X86/sse-builtins-constrained.c
32.82s: Clang :: Analysis/iterator-modeling.cpp
31.37s: Clang :: OpenMP/target_teams_distribute_simd_codegen_registration.cpp
31.10s: Clang :: OpenMP/target_parallel_for_simd_codegen_registration.cpp
30.78s: Clang :: Analysis/use-after-move.cpp
I am very confused though about why is the difference between Tablegen and `opencl-c.h` so insignificant? FYI, also for a single clang invocation with Tablegen and `opencl-c.h` the difference is very insignificant in parsing time of this test - 20.794s vs 21.401s. This is really interesting because with small files the difference is huge 0.043s vs 3.990s on test with empty kernel.
---------------------------------------
I also timed `check-clang` invocation on my 8 core machine:
1. with both tests - 697.70s
2. with all-std-buitins.cl only - 684.43s
3. without any new tests - 673.00s
The change in total testing time appears to be insignificant. I guess this is due to parallel execution?
Btw one thing I have thought of since OpenCL v1.0-1.1 doesn't differ a lot for builtin functions and they are not modified much either, perhaps we only need to test v1.2? That would reduce number of clang invocations to 4 in each test. Then the measurements are as follows:
134.13s: Clang :: SemaOpenCL/all-std-buitins-slow-header.cl
131.52s: Clang :: SemaOpenCL/all-std-buitins.cl
85.81s: Clang :: Headers/arm-neon-header.c
69.14s: Clang :: OpenMP/nesting_of_regions.cpp
60.08s: Clang :: Driver/crash-report.c
59.67s: Clang :: Analysis/PR24184.cpp
57.27s: Clang :: CodeGen/X86/rot-intrinsics.c
56.93s: Clang :: CodeGen/X86/x86_32-xsave.c
56.59s: Clang :: CodeGen/X86/x86_64-xsave.c
55.68s: Clang :: Headers/opencl-c-header.cl
40.71s: Clang :: Driver/crash-report-with-asserts.c
39.44s: Clang :: Lexer/SourceLocationsOverflow.c
38.02s: Clang :: OpenMP/target_teams_distribute_parallel_for_simd_codegen_registration.cpp
37.07s: Clang :: Headers/x86intrin-2.c
32.61s: Clang :: CodeGen/X86/avx512f-builtins-constrained.c
32.58s: Clang :: CodeGen/X86/sse-builtins-constrained.c
32.19s: Clang :: Analysis/use-after-move.cpp
31.96s: Clang :: Analysis/iterator-modeling.cpp
31.02s: Clang :: OpenMP/target_teams_distribute_simd_codegen_registration.cpp
30.59s: Clang :: OpenMP/target_parallel_for_simd_codegen_registration.cpp
with a total testing time 688.61s
**Conclusion:**
- if we test the whole functionality the test will be at least 2x slower than the slowest clang test so far but it hardly affect the full testing time of clang-check on modern HW due to the parallel execution. Also related to this partitioning of test files could help with the latency due to the parallel execution.
- Testing of opencl-c.h only doubles the testing time.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D97869/new/
https://reviews.llvm.org/D97869
More information about the cfe-commits
mailing list