<html>

    <head>

      <base href="https://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - AArch64 vs ARMv7 code quality"

   href="https://llvm.org/bugs/show_bug.cgi?id=28345">28345</a>

          </td>

        </tr>


        <tr>

          <th>Summary</th>

          <td>AArch64 vs ARMv7 code quality

          </td>

        </tr>


        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>


        <tr>

          <th>Version</th>

          <td>3.8

          </td>

        </tr>


        <tr>

          <th>Hardware</th>

          <td>Other

          </td>

        </tr>


        <tr>

          <th>OS</th>

          <td>Linux

          </td>

        </tr>


        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>


        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>


        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>


        <tr>

          <th>Component</th>

          <td>Backend: AArch64

          </td>

        </tr>


        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>


        <tr>

          <th>Reporter</th>

          <td>tulipawn@gmail.com

          </td>

        </tr>


        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr>


        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>I've rerun the benchmark from issue #27103 on a 2GHz Cortex-A53 (64-bit Linux),

and even though it achieves the expected 1.5x improvement (not counting 64-bit

effects), it still loses against a 1.7GHz Cortex-A5, running ARMv7 code, in

naive implementation performance. 


Provided, the last part actually tests the quality of the backend, it looks

rather interesting (table created with cargo-benchcmp script, negative

difference means cortex-a53 is faster):


name                           cortex-a5 ns/iter  cortex-a53 ns/iter    diff

ns/iter   diff %

mat_mul_f32::m004              1,763              1,098                        

-665  -37.72%

mat_mul_f32::m005              2,548              1,587                        

-961  -37.72%

mat_mul_f32::m006              2,889              1,718                      

-1,171  -40.53%

mat_mul_f32::m007              3,154              1,923                      

-1,231  -39.03%

mat_mul_f32::m008              3,627              2,138                      

-1,489  -41.05%

mat_mul_f32::m009              8,142              4,260                      

-3,882  -47.68%

mat_mul_f32::m012              10,370             5,484                      

-4,886  -47.12%

mat_mul_f32::m016              17,117             8,621                      

-8,496  -49.63%

mat_mul_f32::m032              110,929            48,623                    

-62,306  -56.17%

mat_mul_f32::m064              830,408            328,603                  

-501,805  -60.43%

mat_mul_f32::m127              6,416,219          2,387,223              

-4,028,996  -62.79%

mat_mul_f32::m256              52,750,069         20,490,803            

-32,259,266  -61.15%

mat_mul_f32::m512              421,350,950        164,162,031          

-257,188,919  -61.04%

mat_mul_f32::mix128x10000x128  531,878,944        216,476,447          

-315,402,497  -59.30%

mat_mul_f32::mix16x4           27,059             17,923                     

-9,136  -33.76%

mat_mul_f32::mix32x2           21,873             15,748                     

-6,125  -28.00%

mat_mul_f32::mix97             3,815,641          1,449,014              

-2,366,627  -62.02%


mat_mul_f64::m004              2,002              1,202                        

-800  -39.96%

mat_mul_f64::m007              4,593              2,102                      

-2,491  -54.23%

mat_mul_f64::m008              4,590              2,547                      

-2,043  -44.51%

mat_mul_f64::m012              14,212             7,781                      

-6,431  -45.25%

mat_mul_f64::m016              23,181             12,981                    

-10,200  -44.00%

mat_mul_f64::m032              160,551            88,629                    

-71,922  -44.80%

mat_mul_f64::m064              1,273,413          665,406                  

-608,007  -47.75%

mat_mul_f64::m127              10,648,815         5,531,354              

-5,117,461  -48.06%

mat_mul_f64::m256              88,419,854         45,800,153            

-42,619,701  -48.20%

mat_mul_f64::m512              702,121,682        365,977,216          

-336,144,466  -47.88%

mat_mul_f64::mix128x10000x128  876,955,471        493,044,547          

-383,910,924  -43.78%

mat_mul_f64::mix16x4           38,284             20,585                    

-17,699  -46.23%

mat_mul_f64::mix32x2           33,038             13,516                    

-19,522  -59.09%

mat_mul_f64::mix97             6,344,368          3,202,931              

-3,141,437  -49.52%


ref_mat_mul_f32::m004          473                530                          

  57   12.05%

ref_mat_mul_f32::m005          784                941                          

 157   20.03%

ref_mat_mul_f32::m006          1,219              1,537                        

 318   26.09%

ref_mat_mul_f32::m007          1,803              2,689                        

 886   49.14%

ref_mat_mul_f32::m008          2,553              3,783                       

1,230   48.18%

ref_mat_mul_f32::m009          3,830              5,307                       

1,477   38.56%

ref_mat_mul_f32::m012          7,829              11,755                      

3,926   50.15%

ref_mat_mul_f32::m016          17,466             26,824                      

9,358   53.58%

ref_mat_mul_f32::m032          128,387            202,902                    

74,515   58.04%

ref_mat_mul_f32::m064          1,018,211          1,584,415                 

566,204   55.61%


The ref_mat_mul results probably show the optimizer could get smarter :)</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      
      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>