[PATCH] D102116: [LoopIdiom] 'logical right-shift until zero' ('count active bits') "on steroids" idiom recognition.

Roman Lebedev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sat May 8 15:17:44 PDT 2021


lebedev.ri created this revision.
lebedev.ri added reviewers: craig.topper, fhahn, ychen, spatel, jdoerfert, zhuhan0.
lebedev.ri added a project: LLVM.
Herald added a subscriber: hiraditya.
lebedev.ri requested review of this revision.

I think i've added exhaustive test coverage, and i have verified that alive2 is happy with all the tests,
so in principle i'm fine with landing this without review, but just in case..

This adds support for the "count active bits" pattern, i.e.:

  int countActiveBits(unsigned val) {
      int cnt = 0;
      for( ; (val >> cnt) != 0; ++cnt)
          ;
      return cnt;
  }

but a somewhat more general one, since that is what i need:

  int countActiveBits(unsigned val, int start, int off) {
      int cnt;
      for (cnt = start; val >> (cnt + off); cnt++)
          ;
      return cnt;
  }

I've followed in footstep of 'left-shift until bittest' idiom (D91038 <https://reviews.llvm.org/D91038>),
in the sense that iff the `ctlz` intrinsic is cheap, we'll transform,
regardless of all other factors.

This can have a shocking effect on certain benchmarks:

  raw.pixls.us-unique/Olympus/XZ-1$ /repositories/googlebenchmark/tools/compare.py -a benchmarks ~/rawspeed/build-{old,new}/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf
  RUNNING: /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmp49_28zcm
  2021-05-09T01:06:05+03:00
  Running /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench
  Run on (32 X 3600.24 MHz CPU s)
  CPU Caches:
    L1 Data 32 KiB (x16)
    L1 Instruction 32 KiB (x16)
    L2 Unified 512 KiB (x16)
    L3 Unified 32768 KiB (x2)
  Load Average: 5.26, 6.29, 3.49
  ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Benchmark                                                      Time             CPU   Iterations  CPUTime,s CPUTime/WallTime     Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
  ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  p1319978.orf/threads:32/process_time/real_time_mean          145 ms          145 ms          128   0.145319         0.999981   10.1568M       69.8949M        69.8936M      6.88159       6.88146   0.145322
  p1319978.orf/threads:32/process_time/real_time_median        145 ms          145 ms          128   0.145317         0.999986   10.1568M       69.8941M        69.8931M      6.88151       6.88141   0.145319
  p1319978.orf/threads:32/process_time/real_time_stddev      0.766 ms        0.766 ms          128   766.586u         15.1302u          0       354.167k        354.098k    0.0348699     0.0348631   766.469u
  RUNNING: /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmpwb9sw2x0
  2021-05-09T01:06:24+03:00
  Running /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench
  Run on (32 X 3599.95 MHz CPU s)
  CPU Caches:
    L1 Data 32 KiB (x16)
    L1 Instruction 32 KiB (x16)
    L2 Unified 512 KiB (x16)
    L3 Unified 32768 KiB (x2)
  Load Average: 4.05, 5.95, 3.43
  ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Benchmark                                                      Time             CPU   Iterations  CPUTime,s CPUTime/WallTime     Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
  ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  p1319978.orf/threads:32/process_time/real_time_mean         99.8 ms         99.8 ms          128  0.0997758         0.999972   10.1568M       101.797M        101.794M      10.0225       10.0222  0.0997786
  p1319978.orf/threads:32/process_time/real_time_median       99.7 ms         99.7 ms          128  0.0997165         0.999985   10.1568M       101.857M        101.854M      10.0284       10.0281  0.0997195
  p1319978.orf/threads:32/process_time/real_time_stddev      0.224 ms        0.224 ms          128   224.166u          34.345u          0        226.81k        227.231k    0.0223309     0.0223723   224.586u
  Comparing /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench
  Benchmark                                                               Time             CPU      Time Old      Time New       CPU Old       CPU New
  ----------------------------------------------------------------------------------------------------------------------------------------------------
  p1319978.orf/threads:32/process_time/real_time_pvalue                 0.0000          0.0000      U Test, Repetitions: 128 vs 128
  p1319978.orf/threads:32/process_time/real_time_mean                  -0.3134         -0.3134           145           100           145           100
  p1319978.orf/threads:32/process_time/real_time_median                -0.3138         -0.3138           145           100           145           100
  p1319978.orf/threads:32/process_time/real_time_stddev                -0.7073         -0.7078             1             0             1             0


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D102116

Files:
  llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D102116.343868.patch
Type: text/x-patch
Size: 13473 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210508/ec15e204/attachment-0001.bin>


More information about the llvm-commits mailing list