[all-commits] [llvm/llvm-project] 877c84: [Support] unsafe pointer arithmetic in llvm_regcomp()

Brad Smith via All-commits all-commits at lists.llvm.org
Thu Feb 3 16:59:47 PST 2022


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 877c84acd466364858d37c9e2e8d9dfa3891d51a
      https://github.com/llvm/llvm-project/commit/877c84acd466364858d37c9e2e8d9dfa3891d51a
  Author: Miod Vallat <miod at online.fr>
  Date:   2022-02-03 (Thu, 03 Feb 2022)

  Changed paths:
    M llvm/lib/Support/regcomp.c

  Log Message:
  -----------
  [Support] unsafe pointer arithmetic in llvm_regcomp()

regcomp.c uses the "start + count < end" idiom to check that there are
"count" bytes available in an array of char "start" and "end" both point
to.

This is fine, unless "start + count" goes beyond the last element of the
array. In this case, pedantic interpretation of the C standard makes
the comparison of such a pointer against "end" undefined, and optimizers
from hell will happily remove as much code as possible because of this.

An example of this occurs in regcomp.c's bothcases(), which defines
bracket[3], sets "next" to "bracket" and "end" to "bracket + 2". Then it
invokes p_bracket(), which starts with "if (p->next + 5 < p->end)"...

Because bothcases() and p_bracket() are static functions in regcomp.c,
there is a real risk of miscompilation if aggressive inlining happens.

The following diff rewrites the "start + count < end" constructs into
"end - start > count". Assuming "end" and "start" are always pointing in
the array (such as "bracket[3]" above), "end - start" is well-defined
and can be compared without trouble.

As a bonus, MORE2() implies MORE() therefore SEETWO() can be simplified
a bit.

Bug report: https://github.com/llvm/llvm-project/issues/47993

Reviewed By: MaskRay, vitalybuka

Differential Revision: https://reviews.llvm.org/D97129




More information about the All-commits mailing list