[PATCH] D97129: [Support] unsafe pointer arithmetic in llvm_regcomp()

Brad Smith via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sat Feb 20 15:46:12 PST 2021


brad created this revision.
brad added a project: LLVM.
Herald added subscribers: dexonsmith, hiraditya, krytarowski.
brad requested review of this revision.

llvm/lib/Support/regcomp.c is borrowed from OpenBSD, to which the following issue has been reported and fixed. (report and patch in https://marc.info/?l=openbsd-tech&m=160923823113340&w=2 )

regcomp.c uses the "start + count < end" idiom to check that there are "count" bytes available in an array of char "start" and "end" both point to.

This is fine, unless "start + count" goes beyond the last element of the array. In this case, pedantic interpretation of the C standard makes the comparison of such a pointer against "end" undefined, and optimizers from hell will happily remove as much code as possible because of this.

An example of this occurs in regcomp.c's bothcases(), which defines bracket[3], sets "next" to "bracket" and "end" to "bracket + 2". Then it invokes p_bracket(), which starts with "if (p->next + 5 < p->end)"...

Because bothcases() and p_bracket() are static functions in regcomp.c, there is a real risk of miscompilation if aggressive inlining happens. The following diff rewrites the "start + count < end" constructs into "end - start > count". Assuming "end" and "start" are always pointing in the array (such as "bracket[3]" above), "end - start" is well-defined and can be compared without trouble.

As a bonus, MORE2() implies MORE() therefore SEETWO() can be simplified a bit.

>From bug report: https://bugs.llvm.org/show_bug.cgi?id=48649


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D97129

Files:
  llvm/lib/Support/regcomp.c


Index: llvm/lib/Support/regcomp.c
===================================================================
--- llvm/lib/Support/regcomp.c
+++ llvm/lib/Support/regcomp.c
@@ -249,10 +249,10 @@
  */
 #define	PEEK()	(*p->next)
 #define	PEEK2()	(*(p->next+1))
-#define	MORE()	(p->next < p->end)
-#define	MORE2()	(p->next+1 < p->end)
+#define	MORE()	(p->end - p->next > 0)
+#define	MORE2()	(p->end - p->next > 1)
 #define	SEE(c)	(MORE() && PEEK() == (c))
-#define	SEETWO(a, b)	(MORE() && MORE2() && PEEK() == (a) && PEEK2() == (b))
+#define	SEETWO(a, b)	(MORE2() && PEEK() == (a) && PEEK2() == (b))
 #define	EAT(c)	((SEE(c)) ? (NEXT(), 1) : 0)
 #define	EATTWO(a, b)	((SEETWO(a, b)) ? (NEXT2(), 1) : 0)
 #define	NEXT()	(p->next++)
@@ -800,15 +800,17 @@
 	int invert = 0;
 
 	/* Dept of Truly Sickening Special-Case Kludges */
-	if (p->next + 5 < p->end && strncmp(p->next, "[:<:]]", 6) == 0) {
-		EMIT(OBOW, 0);
-		NEXTn(6);
-		return;
-	}
-	if (p->next + 5 < p->end && strncmp(p->next, "[:>:]]", 6) == 0) {
-		EMIT(OEOW, 0);
-		NEXTn(6);
-		return;
+	if (p->end - p->next > 5) {
+		if (strncmp(p->next, "[:<:]]", 6) == 0) {
+			EMIT(OBOW, 0);
+			NEXTn(6);
+			return;
+		}
+		if (strncmp(p->next, "[:>:]]", 6) == 0) {
+			EMIT(OEOW, 0);
+			NEXTn(6);
+			return;
+		}
 	}
 
 	if ((cs = allocset(p)) == NULL) {


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D97129.325254.patch
Type: text/x-patch
Size: 1300 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210220/bb2f8ff4/attachment.bin>


More information about the llvm-commits mailing list