<html>
    <head>
      <base href="http://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - Vectorizer produces incorrect results for regex code"
   href="http://llvm.org/bugs/show_bug.cgi?id=15830">15830</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Vectorizer produces incorrect results for regex code
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>new-bugs
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>new bugs
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>dimitry@andric.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvmbugs@cs.uiuc.edu
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Created <span class=""><a href="attachment.cgi?id=10410" name="attach_10410" title="Testcase for regcomp() vectorization problem">attachment 10410</a> <a href="attachment.cgi?id=10410&action=edit" title="Testcase for regcomp() vectorization problem">[details]</a></span>
Testcase for regcomp() vectorization problem

Recently we upgraded clang in FreeBSD to trunk r178860.  Afterwards, some
people reported problems with configure scripts, caused by sed failing to
replace certain strings, e.g. @CC@ and such.  These people all had the
following settings in common:

- amd64 (x86_64) architecture
- core2 or higher CPUs, and using -march=native
- using -O3 or equivalent optimization flags

After some searching, this turned out to be a miscompilation in regcomp.c, the
libc source containing a part of the regular expression parsing logic.  One of
the functions, computejumps(), computes a Boyer-Moore jump table:

static void
computejumps(struct parse *p, struct re_guts *g)
{
        int ch;
        int mindex;

        /* Avoid making errors worse */
        if (p->error != 0)
                return;

        g->charjump = (int*) malloc((NC + 1) * sizeof(int));
        if (g->charjump == NULL)        /* Not a fatal error */
                return;
        /* Adjust for signed chars, if necessary */
        g->charjump = &g->charjump[-(CHAR_MIN)];

        /* If the character does not exist in the pattern, the jump
         * is equal to the number of characters in the pattern.
         */
        for (ch = CHAR_MIN; ch < (CHAR_MAX + 1); ch++)
                g->charjump[ch] = g->mlen;

        /* If the character does exist, compute the jump that would
         * take us to the last character in the pattern equal to it
         * (notice that we match right to left, so that last character
         * is the first one that would be matched).
         */
        for (mindex = 0; mindex < g->mlen; mindex++)
                g->charjump[(int)g->must[mindex]] = g->mlen - mindex - 1;
}

When this function is inlined into the main regcomp() function at -O3, and the
vectorizer optimizes the last for loop, something is done incorrectly, and the
resulting table sometimes has one faulty entry.

I have attached a sample testcase, that shows the problem at runtime.  If the
sample is compiled with -O2, or with -O3 -fno-vectorize, it will run without
displaying anything.  If it is compiled with -O3, it will display:

charjump[67] is 5 instead of 4</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>