[PATCH] D64512: [InstCombine] Dropping redundant masking before left-shift [0/5] (PR42563)

Tue Jul 16 10:05:23 PDT 2019

lebedev.ri added a comment.

To follow-up on inline comment - **right now** (llvm master vs llvm master with this patchset; rawspeed develop with no patches ontop) this fold happens once:

  $ /repositories/llvm-test-suite/utils/compare.py -m instcombine.MasksDroped /builddirs/llvm-project/build-llvm-test-suite-{old,new}/results.json
  /repositories/llvm-test-suite/utils/compare.py:109: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
  of pandas will change to not sort by default.

  To accept the future behavior, pass 'sort=False'.

  To retain the current behavior and silence the warning, pass 'sort=True'.

    d = pd.concat(datasets, axis=0, names=['run'], keys=datasetnames)
  Tests: 198
  Metric: instcombine.MasksDroped

  Program                                        results results0 diff 
  test-suite :: RawSpeed/RawSpeed.test           NaN      1.00    nan%
  test-suite...AllocatorAdaptorBenchmark.test    NaN     NaN      nan%
  test-suite...lateDecompressorBenchmark.test    NaN     NaN      nan%
  test-suite...sRawInterpolatorBenchmark.test    NaN     NaN      nan%
  test-suite...eed/io/BitStreamBenchmark.test    NaN     NaN      nan%
  test-suite...a/CameraMetaDataBenchmark.test    NaN     NaN      nan%
  test-suite...fDecoderFuzzer-ArwDecoder.test    NaN     NaN      nan%
  test-suite...fDecoderFuzzer-Cr2Decoder.test    NaN     NaN      nan%
  test-suite...fDecoderFuzzer-DcrDecoder.test    NaN     NaN      nan%
  test-suite...fDecoderFuzzer-DcsDecoder.test    NaN     NaN      nan%
  test-suite...fDecoderFuzzer-DngDecoder.test    NaN     NaN      nan%
  test-suite...fDecoderFuzzer-ErfDecoder.test    NaN     NaN      nan%
  test-suite...fDecoderFuzzer-IiqDecoder.test    NaN     NaN      nan%
  test-suite...fDecoderFuzzer-KdcDecoder.test    NaN     NaN      nan%
  test-suite...fDecoderFuzzer-MefDecoder.test    NaN     NaN      nan%
  Geomean difference                                              nan%
         results  results0  diff
  count  0.0      1.0       0.0 
  mean  NaN       1.0      NaN  
  std   NaN      NaN       NaN  
  min   NaN       1.0      NaN  
  25%   NaN       1.0      NaN  
  50%   NaN       1.0      NaN  
  75%   NaN       1.0      NaN  
  max   NaN       1.0      NaN  

  /builddirs/llvm-project/build-llvm-test-suite-new$ grep -r "instcombine.MasksDroped"
  RawSpeed/build/CMakeFiles/rawspeed.dir/src/librawspeed/decompressors/SamsungV0Decompressor.stats:       "instcombine.MasksDroped": 1,
  results.json:        "instcombine.MasksDroped": 1.0, 

I'm expecting that this number will be better in the end, when all the bits are in place.

While there, the previous fold (`dropRedundantMaskingOfLeftShiftInput()`, D63993 <https://reviews.llvm.org/D63993>) is more frequent:

  $ /repositories/llvm-test-suite/utils/compare.py -m instcombine.ShiftsCombined /builddirs/llvm-project/build-llvm-test-suite-{old,new}/results.json
  /repositories/llvm-test-suite/utils/compare.py:109: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
  of pandas will change to not sort by default.

  To accept the future behavior, pass 'sort=False'.

  To retain the current behavior and silence the warning, pass 'sort=True'.

    d = pd.concat(datasets, axis=0, names=['run'], keys=datasetnames)
  Tests: 198
  Metric: instcombine.ShiftsCombined

  Program                                        results results0 diff 
  test-suite :: RawSpeed/RawSpeed.test           26.00   26.00    0.0%
  test-suite...AllocatorAdaptorBenchmark.test    NaN     NaN      nan%
  test-suite...lateDecompressorBenchmark.test    NaN     NaN      nan%
  test-suite...sRawInterpolatorBenchmark.test    NaN     NaN      nan%
  test-suite...eed/io/BitStreamBenchmark.test    NaN     NaN      nan%
  test-suite...a/CameraMetaDataBenchmark.test    NaN     NaN      nan%
  test-suite...fDecoderFuzzer-ArwDecoder.test    NaN     NaN      nan%
  test-suite...fDecoderFuzzer-Cr2Decoder.test    NaN     NaN      nan%
  test-suite...fDecoderFuzzer-DcrDecoder.test    NaN     NaN      nan%
  test-suite...fDecoderFuzzer-DcsDecoder.test    NaN     NaN      nan%
  test-suite...fDecoderFuzzer-DngDecoder.test    NaN     NaN      nan%
  test-suite...fDecoderFuzzer-ErfDecoder.test    NaN     NaN      nan%
  test-suite...fDecoderFuzzer-IiqDecoder.test    NaN     NaN      nan%
  test-suite...fDecoderFuzzer-KdcDecoder.test    NaN     NaN      nan%
  test-suite...fDecoderFuzzer-MefDecoder.test    NaN     NaN      nan%
  Geomean difference                                              nan%
         results  results0  diff
  count  1.0      1.0       1.0 
  mean   26.0     26.0      0.0 
  std   NaN      NaN       NaN  
  min    26.0     26.0      0.0 
  25%    26.0     26.0      0.0 
  50%    26.0     26.0      0.0 
  75%    26.0     26.0      0.0 
  max    26.0     26.0      0.0 

The performance implications of **this** patchset are as i expected them to be:

  raw.pixls.us-unique/Samsung/NX30$ //usr/src/googlebenchmark/tools/compare.py -a benchmarks /builddirs/llvm-project/build-llvm-test-suite-{old,new}/RawSpeed/build/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_repetitions=128 --benchmark_min_time=0.00000001 2015-03-07-163604_sam_7204.srw 
  RUNNING: /builddirs/llvm-project/build-llvm-test-suite-old/RawSpeed/build/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_repetitions=128 --benchmark_min_time=0.00000001 2015-03-07-163604_sam_7204.srw --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmp3aOcOJ
  2019-07-16 19:55:51
  Running /builddirs/llvm-project/build-llvm-test-suite-old/RawSpeed/build/src/utilities/rsbench/rsbench
  Run on (8 X 4000 MHz CPU s)
  CPU Caches:
    L1 Data 16K (x8)
    L1 Instruction 64K (x4)
    L2 Unified 2048K (x4)
    L3 Unified 8192K (x1)
  Load Average: 0.88, 0.76, 1.53
  -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Benchmark                                                                       Time             CPU   Iterations  CPUTime,s CPUTime/WallTime     Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
  -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  2015-03-07-163604_sam_7204.srw/threads:1/process_time/real_time_mean          136 ms          136 ms          128   0.135973          0.99989   20.5978M       151.485M        151.468M      7.35441        7.3536   0.135988
  2015-03-07-163604_sam_7204.srw/threads:1/process_time/real_time_median        136 ms          136 ms          128   0.135954                1   20.5978M       151.507M        151.507M      7.35546       7.35547   0.135953
  2015-03-07-163604_sam_7204.srw/threads:1/process_time/real_time_stddev      0.212 ms        0.193 ms          128   193.542u         237.857u          0       215.294k        236.262k    0.0104522     0.0114703   212.466u
  RUNNING: /builddirs/llvm-project/build-llvm-test-suite-new/RawSpeed/build/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_repetitions=128 --benchmark_min_time=0.00000001 2015-03-07-163604_sam_7204.srw --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmpWIQdvn
  2019-07-16 19:56:10
  Running /builddirs/llvm-project/build-llvm-test-suite-new/RawSpeed/build/src/utilities/rsbench/rsbench
  Run on (8 X 4000 MHz CPU s)
  CPU Caches:
    L1 Data 16K (x8)
    L1 Instruction 64K (x4)
    L2 Unified 2048K (x4)
    L3 Unified 8192K (x1)
  Load Average: 0.92, 0.78, 1.52
  -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Benchmark                                                                       Time             CPU   Iterations  CPUTime,s CPUTime/WallTime     Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
  -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  2015-03-07-163604_sam_7204.srw/threads:1/process_time/real_time_mean          131 ms          131 ms          128   0.131231         0.999992   20.5978M       156.959M        156.958M      7.62015        7.6201   0.131232
  2015-03-07-163604_sam_7204.srw/threads:1/process_time/real_time_median        131 ms          131 ms          128   0.131175                1   20.5978M       157.026M        157.024M      7.62343        7.6233   0.131177
  2015-03-07-163604_sam_7204.srw/threads:1/process_time/real_time_stddev      0.218 ms        0.218 ms          128   217.942u         33.6202u          0       259.861k        259.966k    0.0126159      0.012621   218.033u
  Comparing /builddirs/llvm-project/build-llvm-test-suite-old/RawSpeed/build/src/utilities/rsbench/rsbench to /builddirs/llvm-project/build-llvm-test-suite-new/RawSpeed/build/src/utilities/rsbench/rsbench
  Benchmark                                                                                Time             CPU      Time Old      Time New       CPU Old       CPU New
  ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
  2015-03-07-163604_sam_7204.srw/threads:1/process_time/real_time_pvalue                 0.0000          0.0000      U Test, Repetitions: 128 vs 128
  2015-03-07-163604_sam_7204.srw/threads:1/process_time/real_time_mean                  -0.0350         -0.0349           136           131           136           131
  2015-03-07-163604_sam_7204.srw/threads:1/process_time/real_time_median                -0.0351         -0.0352           136           131           136           131
  2015-03-07-163604_sam_7204.srw/threads:1/process_time/real_time_stddev                +0.0260         +0.1253             0             0             0             0

-4% improvement is notable.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64512/new/

https://reviews.llvm.org/D64512