[llvm-dev] Discuss about the LLVM SW mitigation to Jump Conditional Code Erratum

Zhang, Annita via llvm-dev llvm-dev at lists.llvm.org
Mon Dec 16 00:41:23 PST 2019


Below is the performance and code size ratio of SPEC CPU2017.

Table 3 shows the observed performance impact of the Microcode Update on the SPECrate2017_int_base and SPECrate2017_fp_base benchmark suite when compiled with LLVM compiler. All data is the ratio comparing with the baseline. The columns labeled HW show a 2.6% and 1.3% performance effect in INTRATE geomean and FPRATE geomean respectively. Performance effect on individual components were observed up to 5.1%.



Software-based tools to mitigate these effects are outlined below. From our tests, recompiling the benchmarks recovered the geomean performance to within 99% of the originally observed performance, and the maximum performance loss in SPEC benchmarks was subsequently reduced to within 2.2% of the original performance.



Comparing with the hw_sw_prefix (prefix padding) with hw_sw_nop (nop padding) of SW mitigation, the hw_sw_prefix can provide better performance (0.3%~0.5% in geomean). In individual cases, we have observed a 1.4% performance improvement in prefix padding vs. nop padding. Comparing with sw_prefix with sw_nop on a system w/o MCU, we observed 0.7% better performance in sw_prefix.


In our experiments, we observed that nop padding introduced extra nop instructions into frequently executed code. The additional nop instructions caused capacity pressure in the DSB and caused performance reduction. We introduced the prefix padding to resolve this performance issue.
Since the performance delta in prefix padding and nop padding is incremental, starting from nop padding may be easier to implement as a first step, with additional prefix padding options to explore for additional performance optimizations.

Comparing with hw_sw_prefix (prefix padding to a set of branches) with hw_sw_prefix_align_all (prefix padding to all type of branches), the performance is almost the same in this test.



Table 3 - SPEC CPU2017 SW/Microcode Update vs. baseline performance ratio:

SPEC performance  sw_prefix   sw_nop      sw_prefix_align_all    hw          hw_sw_prefix  hw_sw_nop     hw_sw_prefix_align_all

500.perlbench_r   1.005       0.992       0.999                  0.963       0.994         0.980         0.989

502.gcc_r         0.998       0.982       0.988                  0.985       0.998         0.992         0.985

505.mcf_r         0.995       0.985       0.992                  0.965       0.993         0.997         0.999

520.omnetpp_r     1.001       0.995       0.996                  0.995       0.994         0.995         0.996

523.xalancbmk_r   0.994       0.991       0.993                  0.984       0.988         0.984         0.990

525.x264_r        0.995       0.989       0.993                  0.965       0.986         0.982         0.993

531.deepsjeng_r   0.978       0.971       0.986                  0.981       0.978         0.979         0.986

541.leela_r       0.983       0.982       0.980                  0.985       0.997         0.996         0.993

557.xz_r          1.004       1.007       1.002                  0.949       1.009         1.005         1.006

SIR geomean       0.995       0.988       0.992                  0.974       0.993         0.990         0.993



508.namd_r        0.996       0.996       0.998                  0.999       0.999         0.995         1.002

510.parest_r      0.997       0.997       0.996                  0.992       0.997         0.998         0.996

511.povray_r      1.006       1.006       0.998                  0.976       0.992         0.984         0.994

519.lbm_r         0.999       0.999       0.995                  0.992       0.999         0.999         0.992

526.blender_r     0.998       0.998       1.000                  0.974       1.002         0.995         1.005

538.imagick_r     1.032       1.032       1.025                  0.997       1.015         1.015         1.025

544.nab_r         0.997       0.997       1.005                  0.977       0.995         0.981         0.987

SFR geomean       1.003       1.003       1.002                  0.987       1.000         0.995         1.000



We also measured the increase in code size due to the padding to instructions to align branches correctly (Table 4). The geomean code size increase is 2-3% in both prefix padding and nop padding, with the individual outliers up to 4%.

In sw_prefix_align_all, the geomean code size increase is 3-4%, with the individual outliers up to 6%. This data indicates that aligning all types of branches will have more code size overhead, but with less performance gain. However, it may be variant case by case.



Table 4 - SPEC CPU2017 SW mitigation vs. baseline Code Size ratio:

SPEC code size  sw_prefix       sw_nop          sw_prefix_align_all

500.perlbench_r 1.037           1.037           1.043

502.gcc_r       1.036           1.036           1.045

505.mcf_r       1.022           1.022           1.026

520.omnetpp_r   1.035           1.035           1.060

523.xalancbmk_r 1.031           1.031           1.050

525.x264_r      1.020           1.020           1.025

531.deepsjeng_r 1.016           1.016           1.018

541.leela_r     1.027           1.027           1.032

557.xz_r        1.029           1.029           1.034

SIR geomean     1.028           1.028           1.037



508.namd_r      1.014           1.014           1.015

510.parest_r    1.025           1.025           1.032

511.povray_r    1.024           1.023           1.031

519.lbm_r       1.009           1.009           1.013

526.blender_r   1.032           1.032           1.047

538.imagick_r   1.026           1.026           1.031

544.nab_r       1.029           1.029           1.033

SFR geomean     1.023           1.023           1.029


Test date:
              2019/12/9

 System Configuration:
              Platform: Intel Internal Reference Validation Platform
OS: Red Hat* 8.0 x86_64
Memory: 192 GB
CPUCount: 2
CoreCount: 40
Intel HyperThreading: yes
CPU Model: Intel(r) Xeon(r) Gold 6148 CPU @ 2.40GHz
Microcode w/o microcode update: 0x200005e
Microcode with microcode update: 0x2000065



Compiler options:
              Baseline & hw: -march=skylake-avx512 -mfpmath=sse -Ofast -funroll-loops -flto
***sw_prefix: -march=skylake-avx512 -mfpmath=sse -Ofast -funroll-loops -flto -x86-branches-within-32B-boundaries
              ***sw_nop: -march=skylake-avx512 -mfpmath=sse -Ofast -funroll-loops -flto -x86-align-branch-boundary=32 -x86-align-branch-prefix-size=0 -x86-align-branch=fused+jcc+jmp
              ***sw_prefix_align_all: -march=skylake-avx512 -mfpmath=sse -Ofast -funroll-loops -flto -x86-align-branch-boundary=32 -x86-align-branch-prefix-size=5 -x86-align-branch=fused+jcc+jmp+indirect+call+ret



Notes:

1.     Source: Intel Corporation; SPEC CPU2017 results should be considered estimates as they are measured on non-production platforms and are being provided for research purposes.

2.     Baseline means the system w/o microcode update and w/o SW mitigation.

3.     sw_prefix means SW mitigation of prefix padding is applied to a system w/o microcode update.

4.     sw_nop means SW mitigation of nop padding is applied to a system w/o microcode update.

5.     sw_prefix_align_all means SW mitigation of prefix padding is applied to all impacted branches including call, ret and indirect jump, to a system w/o microcode update.

6.     hw means the microcode update is applied w/o SW mitigation.

7.     hw_sw_prefix means both microcode update and SW mitigation of prefix padding are applied.

8.     hw_sw_nop means both microcode update and SW mitigation of nop padding are applied.

9.     hw_sw_prefix_align_all means microcode update is applied, and SW mitigation of prefix padding is applied to all impacted branches including call, ret and indirect jump.

10.  LLVM measurements are only limited to C/C++ benchmarks. All Fortran benchmarks are excluded.

11.  The test was built with an engineering LLVM compiler plus the SW mitigation patch. The performance data may be variant from build to build.


For more complete information about performance and benchmark results, visit www.intel.com/benchmarks<http://www.intel.com/benchmarks>.  For specific information and notices/disclaimers regarding the Jump Conditional Code Erratum, visit https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191216/d3da6483/attachment.html>


More information about the llvm-dev mailing list