<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/129779>129779</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [flang] surprising performance loss with nested type operator overloading
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            flang
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          ivan-pi
      </td>
    </tr>
</table>

<pre>
    I've attempted to create a performance benchmark which sums an array of numbers, but in different ways to measure the overhead of operator overloading for simple value types:

[abstraction_penalty.F90.txt](https://github.com/user-attachments/files/19077916/abstraction_penalty.F90.txt)

When I run the program, I see the output:

```
$ flang-new -O2 abstraction_penalty.F90 
$ ./a.out
[info] compiler: Homebrew flang version 19.1.4 (https://github.com/Homebrew/homebrew-core/issues)
[info] compiler options: flang-new -O2 abstraction_penalty.F90
[info] using naive sum
[info] number of iterations: 25000

        test    absolute additions  ratio with
      number  time (sec)  per second       test0

 0      0.0532   9.400E+02       1.000
           1      0.0498 1.003E+03       0.937
           2      0.0493   1.015E+03       0.926
 3      0.0526   9.511E+02       0.988
           4      0.0595 8.410E+02       1.118
           5      0.0515   9.700E+02       0.969
 6      0.0486   1.029E+03       0.913
           7      0.0485 1.031E+03       0.912
           8      0.0490   1.020E+03       0.922
 9      0.0472   1.059E+03       0.888
          10      0.0483 1.036E+03       0.907
          11      0.0485   1.031E+03       0.912
 12      0.0479   1.044E+03       0.901
          13      0.0481 1.039E+03       0.905
          14      6.7735   7.382E+00     127.336
 15      6.7167   7.444E+00     126.267
          16      0.0467 1.071E+03       0.878
          17      0.0452   1.105E+03       0.850
 18      0.0451   1.108E+03       0.849
          19      0.0452 1.105E+03       0.850
          20      0.0476   1.050E+03       0.895
 21      0.0469   1.066E+03       0.882
          22      0.0467 1.071E+03       0.877
          23      0.0461   1.086E+03       0.866
 24      0.0454   1.101E+03       0.853
          25      0.0452 1.105E+03       0.851
          26      0.0456   1.097E+03       0.857
 27      0.0454   1.102E+03       0.853
          28      6.6540 7.514E+00     125.089
          29      6.5274   7.660E+00 122.709
------------------------------------------------
        mean 0.0928   5.386E+02        1.75
```

The slow cases (14, 15, 28, 29) are calling the procedure `test_ddd`, which calls `dsum` for the `type(ddd)`, which is really just a double value but defined in a obscure way:

```fortran
    integer, parameter :: dp = c_double

    ! Double wrapper
    type :: dd
        real(dp) :: val
    end type

    ! Double wrapper child with TBP
    type, extends(dd) :: ddi
    contains
 procedure :: get => get_ddi_val
    end type

    ! Double wrapper wrapper
    type :: ddd
        type(dd) :: val
    end type
```

The sum procedure looks as follows:
```fortran
    pure function ddd_sum(a) result(res)
        type(ddd), intent(in) :: a(:)
        type(ddd) :: res
        real(dp), pointer :: t(:)
#if USE_INTRINSIC_SUM
 res%val%val = sum(a%val%val)
#else
        integer :: i
        res = ddd(dd(0.0_dp))
        do i = 1, size(a)
            res = res + a(i)
 end do
#endif
    end function
``` 
where the `+` is the overloaded `operator(+)` defined as,

```fortran
    pure function ddd_add(a,b) result(c)
        type(ddd), intent(in) :: a, b
        type(ddd) :: c
 c%val%val = a%val%val + b%val%val
    end function
```

If the intrinsic sum (`-DUSE_INTRINSIC_SUM`) is used instead, there are no observable penalties. There are other switches too, namely `-DUSE_INTRINSIC_REDUCE` which displays good performance, and `-DUSE_STRUCTURE_CONSTRUCTOR` which makes the performance even worse (300x slower than the baseline). 
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJycWE1v47rO_jXuhqghyd-LLKZfeLt4Zy5mWtxlodhMojO2ZEhyMr2__oKyk9hJT2fODYqksfmQDx8ykmjpnNpqxFWU3UXZw40c_M7YldpLfdurm7Vp3lfPkSj2CNJ77HqPDXgDtUXpEST0aDfGdlLXCGvU9a6T9iccdqregRs6B1KDtFa-g9mAHro1WheJe1gPHpSGRm02aFF7OMh3R547lG6wCH6HYPZodygbwpoerfTGhoutkY3SW9gYC051fYuwl-2A4N97dFHyJWLhL7uTa-etrL0y-q1HLVv_Hj9VLPa_fJQ9RKLced8HhHiKxNNW-d2wjmvTReJpcGhvpfey3nWovYvE00a1SJ-8YkVR8TwST59FENVI5N871PAMdtAhr96arZUd6fAMDqdkB98P_sw9Z9Mf-xKJFDat1NtbjQe4_Sbgb4LCZBwTr9gMfhRB6Y2JsgeoTderFm2UfIH_Mx2uLR5Gx7BH65TRwKuYxyl8LswRG4mn3fTvbW0sRuJJOTeQQtXHkcH0xJn8_llGSzeDo6prqfZI3bW8OXYX9Yry1CvHOCJjjI2iwvTy6Dx9yrUz7UCN3DQqAAACEg7K706AyTN41SFJ47CORAXU_OCwNrqZ-T2GYuMlFrMsEQBQxSljj5G4Y2Ky5vFIDM4vfgKlVRkMkgBJ4HijSoolRMwgyeiVZ5cgkRMoOVMSeaCUcb6gxOKqLJf-0zOoyqCMU36ZBecXkOwM4VmIU1ykzuIqpxaB_My-zEf2orpkz5Ol_2IGygiS8CuIWELKmUpsisOuVAqg6mxaiNE0u6RUXqjE2YxSEijll97ZsnCcL7KAz_LgsyIX1WibplcB-DJAMgvAg_srZVm2hEzFzuOiSIhTESelCKAxQS6KOElCM_HsZMvzItimE6ejbR6L_CLpWb3zgjgVlymXxYW0s2pnYz04u2zwMgs_JD4rc8Yn2_LSNq2WAapFgE_cn15iVu5iatvssp3KKogrZnXOp9rll81Rlst-FeIPdFpKK2blzsfcWXkVJw-1E-ks5XTS6SpAtvzZiewPdFq2oJiVO5t0qoorUEhEFB9wEr_jNBU8j_MsZVDEGV-2YBazclluUR0hmSjS0Ld5ziYQFyIuGAFu_-FrFqNDqSmNKrDL4mQqw3HxAx4X2eUWz7687BBcaw5QS4eO9hme0hGBZ_QuyvBe0cYjLUIt25Y2w-k8UWNDp6YoZ7QHvTVNQ37F_XQUI2tHdxvaNnMWTk4EJcB7j5EoCSKqBUo5sCjb9h3-GpwHCY0Z1qezFh3hGtwojQ0d5SSYtauJxEG-Xx9jNsZ6K_Ukk9Iet2gpVC-t7NCjBQIlX6DpIUoeoH4bo5137khweBgZHKzse7TTDcrghG5mlSD2lFpPqk0Ge9lOFqibAP0sAtQ71TbhOAAvd_-aBSTu-MujblxQbxaiadRkWBvtpdKOvs6qNJpt0VOmUfJI_741jXr7h-Q-k2Guw6nEv9Xhg5Ycuhn11pifDqSDjWlbczietT-uck-IzaDDqY4ovVH3iVISDYtuaH0kSns8L16xDQ0p7kO3aDJVepaAjEQZzqd_iz1aUoSPmyI0oCH_p_bzc7eRSNQGXn88vj1_ffn-_PXH8_3bj9f_J2-BdkYqhvfQs6f0ztdPfrB1OGMx_QKOUdWCoAveQhKhaCWL2dtEeJ5tY0AFU06JOPUfnNRdHn5OHsOnuAvSqcmMqt-YiaNu1GbWFcfazUtMM8Zhh9OIFtaLO7qs3Glmo_EMG7p3nNpIU3E3ri-nRUPSJPjpMnHdQDLIISNxv140Uf2_ttA9rH_XPjUZ1FfFlssr4g7Wi8J_LuOY9_MmiKa0t0o7VYdfG4mVs9uH67YjsStSenBh0XUeZUM5-FAQ2he0oXUY7V7SQjFOUgpdDC8nE0PW4A7K1zukoduQCy07bN_hg8jfHx9e7x-pcuO20CjXtzSub41p5k8AyI3UzdnHj5fvr_cvr98f3-6_fR2_fPt-dtTJnzh2zfwxAu5Rw8FYF2athLFfYVtE2rDkOECvpcNWaYxEFcNNs0qaKqnkDa54kXIuUsbEzW6VI5YCRV2xjNfpmqNIE5EnVV2LuhCb4katBBMZS1gqeJIIETPcZOtqU6SiyOq6aaKUYSdVG7ftvouN3d6E-XbFRVUU1U0r19i68OxEiDDORkJE2cONXRHgdj1sXZSyVjnvzi688m144DIiMlo2bG9VmG3nQrTGuXHn0ejCgxda4D96EHIz2Hb1ycBOsaeP296av7D2s1n9aUpnvxL_DQAA__-4isV7">