<div dir="ltr">The thing to do with problems like this is to run with clang's sanitizers, which diagnose undefined behavior, memory errors, and other such issues that often show up only under optimization. A cursory look at all the casting in this example makes me think there is undefined behavior, but there very well could be another type of failure.<div><br></div><div><a href="https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html">https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html</a><br></div><div><a href="https://clang.llvm.org/docs/AddressSanitizer.html">https://clang.llvm.org/docs/AddressSanitizer.html</a><br></div><div><a href="https://clang.llvm.org/docs/MemorySanitizer.html">https://clang.llvm.org/docs/MemorySanitizer.html</a><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Dec 21, 2021 at 9:30 AM Adrian Moreno via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hello,<br>
<br>
I need some help understanding what might be wrong with a piece of code from the <br>
openvswitch project. By ${subject} I'm not suggesting there's a problem in <br>
clang, gcc also shows the same behavior so it's likely our code is broken. I am <br>
kindly asking for help to understand/troubleshoot the problem.<br>
<br>
Summary: It seems that certain interaction between two main openvswitch data <br>
structures, when optimized ("-O2 -flto=auto") is broken.<br>
The two data structures are:<br>
<br>
hmap: <a href="https://github.com/openvswitch/ovs/blob/master/include/openvswitch/hmap.h" rel="noreferrer" target="_blank">https://github.com/openvswitch/ovs/blob/master/include/openvswitch/hmap.h</a><br>
list: <a href="https://github.com/openvswitch/ovs/blob/master/include/openvswitch/list.h" rel="noreferrer" target="_blank">https://github.com/openvswitch/ovs/blob/master/include/openvswitch/list.h</a><br>
<br>
I've reproduced the problem outside of openvswitch daemon using a short C <br>
program (attached)<br>
<br>
Code snippet:<br>
<br>
struct bond {<br>
struct hmap members;<br>
};<br>
<br>
struct member {<br>
struct hmap_node hmap_node;<br>
int order;<br>
struct ovs_list elem;<br>
};<br>
<br>
int main() {<br>
int ret = 0;<br>
struct member *member, *member1, *member2;<br>
struct bond *bond;<br>
struct ovs_list start = {0};<br>
<br>
bond = malloc(sizeof *bond);<br>
memset(bond, 0, sizeof (struct bond));<br>
hmap_init(&bond->members);<br>
<br>
member1 = malloc(sizeof *member1);<br>
member2 = malloc(sizeof *member2);<br>
memset(member1, 0, sizeof (struct member));<br>
memset(member2, 0, sizeof (struct member));<br>
<br>
member1->order = 3;<br>
member2->order = 2;<br>
<br>
hmap_insert(&bond->members, &member1->hmap_node, (uint32_t)(uintptr_t)member1);<br>
hmap_insert(&bond->members, &member2->hmap_node, (uint32_t)(uintptr_t)member2);<br>
<br>
ovs_list_init(&start);<br>
HMAP_FOR_EACH (member, hmap_node, &bond->members) {<br>
/*<br>
* Insert member in start (sorted)<br>
* */<br>
struct member *pos;<br>
LIST_FOR_EACH (pos, elem, &start) {<br>
if (member->order > pos->order) {<br>
break;<br>
}<br>
}<br>
// TESTED: If I add this printf, the problem disappears<br>
//printf("Inserting member: %p\n", member);<br>
ovs_list_insert(&pos->elem, &member->elem);<br>
}<br>
<br>
/* I've inserted two members into the 'start' list.<br>
* first and last have to be either member1 or member2<br>
* */<br>
if ((first != member1 && first != member2) || (last != member1 && last != <br>
member2)) {<br>
printf("list is broken!\n");<br>
}<br>
<br>
}<br>
<br>
<br>
What I know for now:<br>
* -fno-strict-aliasing does not fix it<br>
* Only happens with "-O2 -flto=auto"<br>
* If I define 'ovs_list *start' and change the code to use the pointer directly <br>
and not '&start' the problem disappears. It seems that the LIST_FOR_EACH macros <br>
prefer an lvalue rather than "&" but I don't get why.<br>
* I'm not able to reproduce without using hmap _and_ ovs_list.<br>
* If I add a compiler barrier (or a call to an external function) after the <br>
loop, the problem disappears (e.g printf), the problem disappears.<br>
* If I add -fsanitize=undefined the problem disappears!<br>
<br>
I'd really appreciate any hint or idea to try to understand this problem.<br>
<br>
Thanks in advanced.<br>
<br>
-- <br>
Adrián Moreno_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
</blockquote></div>