[clang] [clang][CodeGen] Omit pre-opt link when post-opt is link requested (PR #85672)

Jacob Lambert via cfe-commits cfe-commits at lists.llvm.org
Wed May 1 10:25:39 PDT 2024


lamb-j wrote:

I've been working on testing this patch with an array of OpenCL benchmarks over the past month. We did some high-level regression testing with the following benchmarks:

BlackMagic
Linpack_Dgemm
babelstream-Double
babelstream-Float
chimex
clfft
clmem
clpeak-Double-precision compute
clpeak-Global memory bandwidth
clpeak-Integer compute
clpeak-Single-precision compute
clpeak-Transfer bandwidth
computeApps
dgemm_linux
fahbench
flopscl
ge-workspace
ge_rdppenality
indigo-benchmark
lattice
luxmark
luxmark4
mixbench-ocl-ro
ocltst
shoc
silentarmy
viennacl

With apps, we didn't see any significant regressions.

I also did some in-depth testing with FAHbench and Chimex:

**FAHBench**

Current:
    Final score:  216.8422, 218.2792, 218.3647
    Scaled score: 216.8422 (23558 atoms)
 
    App Runtime: 1m42.181s, 1m42.185s,  1m42.167s

    Compilation time: 3226 ms

With this PR:
    Final score:  222.3547,  219.8134, 223.3722
    Scaled score: 222.3547 (23558 atoms)
 
    App Runtime: 1m40.852s, 1m40.850s,  1m40.849s

     Compilation time: 1822 ms

Between the two builds, the total runtime difference is ~1.3 seconds, and the difference in compilation is also ~1.3 seconds. So it does seem to support that we're only removing overhead with this PR, not introducing regressions. I also looked into the intermediate files. If we dump the two final .so files, they're nearly identical, with only a few lines differing.

**Chimex**

Current:

    Correlation matrices computation time: 2.3876s on GPU 
    [Theoretical max: @13.9 TFLOPS, 1659.3 kHz; 83% efficiency]
    [Algorithm max:   @13.9 TFLOPS, 1634.6 kHz; 84% efficiency]
    
    Compilation Time: 742 ms

With this PR:

    Correlation matrices computation time: 1.9782s on GPU
    [Theoretical max: @13.9 TFLOPS, 1659.3 kHz; 100% efficiency]
    [Algorithm max:   @13.9 TFLOPS, 1634.6 kHz; 101% efficiency]
   
     Compilation Time: 551 ms

https://github.com/llvm/llvm-project/pull/85672


More information about the cfe-commits mailing list