[LLVMbugs] [Bug 15830] New: Vectorizer produces incorrect results for regex code

bugzilla-daemon at llvm.org bugzilla-daemon at llvm.org
Tue Apr 23 06:46:07 PDT 2013


            Bug ID: 15830
           Summary: Vectorizer produces incorrect results for regex code
           Product: new-bugs
           Version: trunk
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: new bugs
          Assignee: unassignedbugs at nondot.org
          Reporter: dimitry at andric.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

Created attachment 10410
  --> http://llvm.org/bugs/attachment.cgi?id=10410&action=edit
Testcase for regcomp() vectorization problem

Recently we upgraded clang in FreeBSD to trunk r178860.  Afterwards, some
people reported problems with configure scripts, caused by sed failing to
replace certain strings, e.g. @CC@ and such.  These people all had the
following settings in common:

- amd64 (x86_64) architecture
- core2 or higher CPUs, and using -march=native
- using -O3 or equivalent optimization flags

After some searching, this turned out to be a miscompilation in regcomp.c, the
libc source containing a part of the regular expression parsing logic.  One of
the functions, computejumps(), computes a Boyer-Moore jump table:

static void
computejumps(struct parse *p, struct re_guts *g)
        int ch;
        int mindex;

        /* Avoid making errors worse */
        if (p->error != 0)

        g->charjump = (int*) malloc((NC + 1) * sizeof(int));
        if (g->charjump == NULL)        /* Not a fatal error */
        /* Adjust for signed chars, if necessary */
        g->charjump = &g->charjump[-(CHAR_MIN)];

        /* If the character does not exist in the pattern, the jump
         * is equal to the number of characters in the pattern.
        for (ch = CHAR_MIN; ch < (CHAR_MAX + 1); ch++)
                g->charjump[ch] = g->mlen;

        /* If the character does exist, compute the jump that would
         * take us to the last character in the pattern equal to it
         * (notice that we match right to left, so that last character
         * is the first one that would be matched).
        for (mindex = 0; mindex < g->mlen; mindex++)
                g->charjump[(int)g->must[mindex]] = g->mlen - mindex - 1;

When this function is inlined into the main regcomp() function at -O3, and the
vectorizer optimizes the last for loop, something is done incorrectly, and the
resulting table sometimes has one faulty entry.

I have attached a sample testcase, that shows the problem at runtime.  If the
sample is compiled with -O2, or with -O3 -fno-vectorize, it will run without
displaying anything.  If it is compiled with -O3, it will display:

charjump[67] is 5 instead of 4

You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20130423/bab3b7e4/attachment.html>

More information about the llvm-bugs mailing list