<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/93060>93060</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [clang][ARCH64] sparse dot product optimization
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            clang
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          JulianMarcusSchroeder
      </td>
    </tr>
</table>

<pre>
    I'm looking into a sparse dot product function. When compiled with `-march=armv8.4a+sve` it runs twice as fast as with `-mcpu=neoverse-v1` or `-mcpu=native`. When adding `-mllvm -opt-bisect-limit=40` I see the performance with neoverse/native on par with armv8.4a+sve. When plotting the range of `-opt-bisect-limit` I see the optimization results making very large swings.
I've tried this wing clang-15, clang-18 and clang top-of-tree. 
Is this is a bug?
If so, how should I proceed debugging this?

Code:
```
float v_sparse_dot(uint32_t *lhs_idx, uint32_t *rhs_idx, float *lhs_val,
                   float *rhs_val, size_t lhs_len, size_t rhs_len) {
  size_t lhs_pos = 0, rhs_pos = 0;
  float xy = 0;

  while (lhs_pos < lhs_len && rhs_pos < rhs_len) {
    if (lhs_idx[lhs_pos] == rhs_idx[rhs_pos]) {
      xy += lhs_val[lhs_pos] * rhs_val[rhs_pos];
      lhs_pos++;
      rhs_pos++;
    } else if (lhs_idx[lhs_pos] < rhs_idx[rhs_pos]) {
      lhs_pos++;
    } else {
      rhs_pos++;
    }
  }
  return xy;
}
```

</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyMVUuPqzYU_jXO5ogIbAjJgsUkUdSp1E276HJk8AHcGhvZhszcX1-ZRx4znasbRQH7O9_DNpxw52SjEQuSHUl23vDBt8YWvw9Kcv0Ht9Xg_qpaa1Cg3ZRGfBSvhOYdKGP-lboBqb0BDq7n1iEI46G3RgyVh3rQlZdGb-HvFjVUpuulQgFX6VsguzjquK1aws7cduN-m3JCj25EsotBerCDduCvskLgDmrufLjeuVU_EHbWaEa0DqMxCTxjn0Du5aS3JOBChMRThVJjB5HpfVRKh5WPlOykJ-ycxkHoFRwi-BahR1sb23Fd4ey-WhJ6mQ3AaOi5ndHntSzGvTLeB-sgaLluEEw95fgS4Mnb9F528gcPuwgW3aC8g45P-z6i_QDFbYPgrlI3bkviM4lfwumMCN5KFOBbGfZMN1AprpsoyQg9rfd74FrMA_Cmj0wdeYu4hUXIzXTpgEM5NIRdFqAGZ4JOa67gWjMoAa_h1CtEAQLLoWnmxUp3I82_JyOQsGVAdvHynYa1MtzD-DY_SW_CeEL3g9Se0TcPhL6o1r1J8R6cH6ftfXqWWEpHrgg9zdrw9XOrtbdacPIHvnkIbIX6YcauMwcg-XHVfCjvjQPCzhAHkn2aYLf62fL94xOwwtdWKgRC93fB05oFCN0RunuQPn0TCkDWq0jYluy4yJHsHIyDt71h9oZ9kYEpKT0Gwrqfz2L0BewNeFBiTyorgx4nsSfMfouR_AyoHP58NadfXcr3IW5Gnxg_jbYOHm4t-sFqeP-4n-sKfnrQN6Jg4sAOfINFkic7dmBpzjZtkcRJWaaHbE-z_Z6leZkkPKfp7sCTGpOEbWRBY5rGGaVJluXpYVvuaZzFeVmVLKl5LkgaY8el2ob2tjW22UjnBiwOLN7FG8VLVG5q9JRO7z2hNPR8W4T6qBwaR9JYSefdXcFLr6Z_h5mRnUl2fPnz9NsuDWfwP33_sWttBquK1vvehdeeXgi9NNK3Q7mtTEfoJbgsl6i35h-sPKGXKbQj9DLnHgv6XwAAAP__7k8GRw">