<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - unsafe pointer arithmetic in llvm_regcomp()"
href="https://bugs.llvm.org/show_bug.cgi?id=48649">48649</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>unsafe pointer arithmetic in llvm_regcomp()
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>All
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Support Libraries
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>miod@trust-in-soft.com
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>llvm/lib/Support/regcomp.c is borrowed from OpenBSD, to which the following
issue has been reported and fixed. (report and patch in
<a href="https://marc.info/?l=openbsd-tech&m=160923823113340&w=2">https://marc.info/?l=openbsd-tech&m=160923823113340&w=2</a> )
regcomp.c uses the "start + count < end" idiom to check that there are "count"
bytes available in an array of char "start" and "end" both point to.
This is fine, unless "start + count" goes beyond the last element of the array.
In this case, pedantic interpretation of the C standard makes the comparison of
such a pointer against "end" undefined, and optimizers from hell will happily
remove as much code as possible because of this.
An example of this occurs in regcomp.c's bothcases(), which defines bracket[3],
sets "next" to "bracket" and "end" to "bracket + 2". Then it invokes
p_bracket(), which starts with "if (p->next + 5 < p->end)"...
Because bothcases() and p_bracket() are static functions in regcomp.c, there is
a real risk of miscompilation if aggressive inlining happens. The following
diff rewrites the "start + count < end" constructs into "end - start > count".
Assuming "end" and "start" are always pointing in the array (such as
"bracket[3]" above), "end - start" is well-defined and can be compared without
trouble.
As a bonus, MORE2() implies MORE() therefore SEETWO() can be simplified a bit.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>