[llvm-bugs] [Bug 37469] New: A miscompilation bug with unsigned char

via llvm-bugs llvm-bugs at lists.llvm.org
Tue May 15 06:53:36 PDT 2018


https://bugs.llvm.org/show_bug.cgi?id=37469

            Bug ID: 37469
           Summary: A miscompilation bug with unsigned char
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: Transformation Utilities
          Assignee: unassignedbugs at nondot.org
          Reporter: juneyoung.lee at sf.snu.ac.kr
                CC: llvm-bugs at lists.llvm.org

Created attachment 20304
  --> https://bugs.llvm.org/attachment.cgi?id=20304&action=edit
A source file that raises the bug.

```
$ cat test-main.c
#include <string.h>
#include <stdio.h>
#include <stdint.h>

// If f(A, B + 4) is given, and integer representation of A and B + 4
// are the same, c1 == c2 in the loop becomes true,
// so arr3 = arr1. Hence r = A, and *A should be 10.
// However, if compiled with -O3, *A is still 1.
void store_10_to_p(int *p, int *q) {
  unsigned char arr1[8];
  unsigned char arr2[8];
  unsigned char arr3[8];
  // Type punning with char* is allowed.
  memcpy((unsigned char*)arr1, (unsigned char *)&p, sizeof(p));
  memcpy((unsigned char*)arr2, (unsigned char *)&q, sizeof(q));
  // Now arr1 is p, arr2 is q.

  for (int i = 0; i < sizeof(q); i++) {
    int c1 = (int)arr1[i], c2 = (int)arr2[i];
    // Note that c1 == c2 is a comparison between _integers_ (not pointers).
    if (c1 == c2)
      // Always true if p and q have same integer representation.
      arr3[i] = arr1[i];
    else
      arr3[i] = arr2[i];
  }
  // Now arr3 is equivalent to arr1, which is p.
  int *r;
  memcpy(&r, (unsigned char *)arr3, sizeof(r));
  // Now r is p.
  *p = 1;
  *r = 10;
}

int main() {
  int A[4], B[4];
  printf("%p %p\n", A, &B[4]);
  if ((uintptr_t)A == (uintptr_t)&B[4]) {
    store_10_to_p(A, &B[4]);
    printf("%d\n", A[0]);
  }
  return 0;
}

$ clang -O3 -o test-main test-main.c
$ ./test-main
0x7fffffffe580 0x7fffffffe580
1
$ clang -O0 -o test-main test-main.c
$ ./test-main
0x7fffffffe580 0x7fffffffe580
10
```

This is what is happening inside LLVM:
(1) Instcombine changes the loop body to "arr3[i] = arr2[i];"
(2) Loop idiom recognizer replaces the loop with a "memcpy(arr3, arr2, 8)"
(3) Instcombine does the store forwarding from the memcpy to the load


I think this is related with lowering 'unsigned char' in C into 'i8' in LLVM.

There are two choices:
(1) Lowering 'unsigned char' into 'i8' is correct.
(2) Lowering 'unsigned char' into 'i8' is incorrect.

If (1) is right, then one of the optimizations happening in this example should
be disabled, to stop the miscompilation.
If (2) is right, then it is clang which miscompiles this example. 'unsigned
char' should be lowered into something else.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20180515/eb6de6c1/attachment.html>


More information about the llvm-bugs mailing list