[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

Corentin Jabot via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Jun 22 09:32:49 PDT 2022


cor3ntin added a comment.

@aaron.ballman

I created a 1.3GB file containing 500000 comments of random size and content,
with the following script

  import string
  import random
  
  for i in range(0, 500000):
      print("/*{}*/".format(''.join(random.choices('\n' + string.ascii_uppercase + string.digits,
                              k=random.randrange(400, 5000)))
      ))
  print("int main() {}")

The results are as follow:

  Benchmark 1: clang++-14 -cc1 foo.cpp
    Time (mean ± σ):     197.5 ms ±   3.4 ms    [User: 120.8 ms, System: 76.4 ms]
    Range (min … max):   190.3 ms … 203.8 ms    14 runs
   
  Benchmark 2: ./bin/clang++ -cc1 foo.cpp
    Time (mean ± σ):     216.3 ms ±   3.0 ms    [User: 139.4 ms, System: 76.7 ms]
    Range (min … max):   211.6 ms … 220.9 ms    13 runs
   
  Benchmark 3: ./bin/clang++ -cc1 -Winvalid-utf8 foo.cpp
    Time (mean ± σ):     212.6 ms ±   1.7 ms    [User: 144.9 ms, System: 67.6 ms]
    Range (min … max):   210.3 ms … 215.2 ms    14 runs
   
  Summary
    'clang++-14 -cc1 foo.cpp' ran
      1.08 ± 0.02 times faster than './bin/clang++ -cc1 -Winvalid-utf8 foo.cpp'
      1.10 ± 0.02 times faster than './bin/clang++ -cc1 foo.cpp'

You will notice i no longer check for whether the diagnostic is enabled preemptively as this turned out to be have negative effects on performance.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128059/new/

https://reviews.llvm.org/D128059



More information about the cfe-commits mailing list