<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - Unrolling loops leads to sub-optimal code generation in ARM Cortex A72"

   href="https://bugs.llvm.org/show_bug.cgi?id=46999">46999</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>Unrolling loops leads to sub-optimal code generation in ARM Cortex A72

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>clang

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>10.0

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>Other

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Linux

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Driver

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedclangbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>bhavana.kilambi@blackfigtech.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org, neeilans@live.com, richard-llvm@metafoo.co.uk

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Created <span class=""><a href="attachment.cgi?id=23816" name="attach_23816" title="string copy source code">attachment 23816</a> <a href="attachment.cgi?id=23816&action=edit" title="string copy source code">[details]</a></span>

string copy source code

Summary: 

LLVM 10.0.0.1 unrolls two loops in a string copy function which leads to

unaligned memory access when -mcpu= cortex-a72 (ARM) option is specified during

compilation. When this option is not specified, the compiler does not unroll

the loops and the code gen does not contain any unaligned accesses.

Build options and other details for reproduction:

LLVM: clang version 10.0.0.1 

Arch : ARM cortex-a72

Optimization options used : -fno-builtin --target=arm64 -mcpu= cortex-a72

-ffixed-x18 -std=c11 -nostdlibinc -nostdinc++ -ftls-model=local-exec

-fno-builtin -fno-strict-aliasing -mno-implicit-float -O2 -w 

( The above list of build options have been added as they are specific to our

workload and are required to be used during compilation)

Source code and code gen : 

The source is a basic strncpy function and is attached in this bug. 

The following is the assembly generated when compiled with -mcpu=cortex-a72

option :

        .p2align        4

        .type   strncpy,@function

strncpy: 

        cbz     w2, .LBB0_10

        mov     x8, x0

.LBB0_2:                         

        ldrb    w9, [x1]

        cbz     w9, .LBB0_4

        add     x1, x1, #1    

        subs    w2, w2, #1 

        strb    w9, [x8], #1

        b.ne    .LBB0_2

        b       .LBB0_10

.LBB0_4:                   

        sub     w9, w2, #1

        tst     w2, #0x3

        b.eq    .LBB0_8

        mov     w10, wzr

        and     w11, w2, #0x3

.LBB0_6:                   

        strb    wzr, [x8], #1

        add     w10, w10, #1

        cmp     w11, w10

        b.ne    .LBB0_6

        sub     w2, w2, w10

.LBB0_8:                                

        cmp     w9, #3                  

        b.lo    .LBB0_10

.LBB0_9:                             

        subs    w2, w2, #4     

        str     wzr, [x8], #4

        b.ne    .LBB0_9

.LBB0_10:                           

        ret

.Lfunc_end0:

        .size   strncpy, .Lfunc_end0-strncpy 

>From the assembly sequence above, it can be noticed that there is a 4B store

“str wzr, [x8], #4” which could be to an unaligned memory location.

Without the -mcpu=cortex-a72 option the compiler generates the following

assembly sequence : 

.p2align        2

        .type   strncpy,@function

strncpy:     

        cbz     w2, .LBB0_5

        mov     x8, x0

.LBB0_2:                                

        ldrb    w9, [x1]

        cbz     w9, .LBB0_4

        add     x1, x1, #1              

        subs    w2, w2, #1           

        strb    w9, [x8], #1

        b.ne    .LBB0_2

        b       .LBB0_5

.LBB0_4:                              

        add     w9, w9, #1      

        cmp     w2, w9

        strb    wzr, [x8], #1

        b.ne    .LBB0_4

.LBB0_5:                         

        ret

Observations:

After some debugging with the unroll pass in LLVM, I notice that the

-mcpu=cortex-a72 option uses the model file for arm cortex A-57 which in turn

overrides some default values (for ARM generic) related to loop buffer.

       The case where the cortex-a72 option is not used, it uses the default

value for “LoopMicroOpBufferSize” (which is 0) in getUnrollingPreferences()

function in BasicTTIImpl.h. 

       With the -mcpu=cortex-a72 option, the value of "LoopMicroOpBufferSize"

is overridden by 16 because of which eventually in the function, the variable

"UP.Runtime" is set to True and the loop gets unrolled. As the value for

“LoopMicroOpBufferSize” is 0 for the case without the cortex-a72 option, it

returns control to the "LoopUnrollPass.cpp" where the default value for

"UP.Runtime" is False and hence the loop does not get unrolled. 

Possible Solution(s):

Disabling loop unrolling with the -mcpu=cortex-a72 option results in no

unrolling and the assembly resembles that of the case without this option.

However, we would like to know if some setting changes could be possible

regarding the default “LoopMicroOpBufferSize” for cortex-a72 specifically? Or

any other work around that can be done in the LLVM source instead of explicitly

using flags or other options?</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>