<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - Exponential time in Machine Instruction Scheduler with StackTagging and large allocas"
   href="https://bugs.llvm.org/show_bug.cgi?id=47867">47867</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Exponential time in Machine Instruction Scheduler with StackTagging and large allocas
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Common Code Generator Code
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>eugeni.stepanov@gmail.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>// compile with -target aarch64-linux-android30 -march=armv8+memtag -c
-fsanitize=memtag -O2

#include <string.h>

void g(void *);
void f() {
  char buf[N*1024];
  memset(buf, 42, sizeof(buf));
  g(buf);
}

N   time in MIS, s
1   0.0010
2   0.0028
3   0.0080
4   0.0180
8   0.1511
16   1.2634
24   4.3588
32  10.3364
40  20.2147
48  35.1360

All the time is spent in BaseMemOpClusterMutation::clusterNeighboringMemOps and
its callees:

  for (unsigned Idx = 0, End = MemOpRecords.size(); Idx < (End - 1); ++Idx) {
    // Decision to cluster mem ops is taken based on target dependent logic     
    auto MemOpa = MemOpRecords[Idx];

    // Seek for the next load/store to do the cluster.                          
    unsigned NextIdx = Idx + 1;
    for (; NextIdx < End; ++NextIdx)
      // Skip if MemOpb has been clustered already or has dependency with       
      // MemOpa.                                                                
      if (!SUnit2ClusterInfo.count(MemOpRecords[NextIdx].SU->NodeNum) &&
          !DAG->IsReachable(MemOpRecords[NextIdx].SU, MemOpa.SU) &&
          !DAG->IsReachable(MemOpa.SU, MemOpRecords[NextIdx].SU))
        break;
    if (NextIdx == End)
      continue;

It's not clear to me why the algorithm is exponential (or at least seems to
be), but it is triggered by AArch64StackTagging inserting an absurd amount of
STG instructions.

I'll work around in stack tagging by not applying the -stack-tagging-merge-init
optimization to large allocas.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>