[PATCH] D90648: [SCEV] Fix nsw flags for GEP expressions

Jordan Rupprecht via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Nov 25 13:39:45 PST 2020

rupprecht added a comment.

In D90648#2409773 <https://reviews.llvm.org/D90648#2409773>, @nikic wrote:

> In D90648#2409760 <https://reviews.llvm.org/D90648#2409760>, @davidxl wrote:
>> This patch does not have a new test case added to demonstrate what this patch is trying to fix or reference the bug number showing the bug end-to-end.
> This does not fix a specific end-to-end miscompile. It fixes incorrect analysis results that may result in a miscompile. See also D90708 <https://reviews.llvm.org/D90708> for a clarification of GEP inbounds semantics in LangRef.

We're actually seeing this *cause* a miscompile, AFAICT.

The method `my_strnxfrm_unicode_full_bin` from mysql 5.7 [1] has a loop like:

  const uchar *const weightend = dst + (nweights*3);
  const uchar *const end = (weightend < de) ? weightend : de;
  while (dst < end-2)
    *dst++= 0x00;
    *dst++= 0x00;
    *dst++= 0x20;

The failure we see is ASAN kill the process with heap-buffer-overflow when it runs into bad memory. However, the write is well past `end`, indicating the loop is not exiting when it should.

I dumped the IR for this method, and it looks like the beginning & end changed like so:

    %add.ptr56 = getelementptr inbounds i8, i8* %cond, i64 -2, !dbg !1195  ; This is "end - 2"
    br i1 %cmp57212, label %while.body, label %while.end, !dbg !1197
  while.body:                                       ; preds = %if.then48, %163
    %incdec.ptr65 = getelementptr inbounds i8, i8* %dst.addr.4214, i64 3, !dbg !1202  ; This is the last "*dst++ = 0x20" expression
    %cmp57 = icmp ult i8* %incdec.ptr65, %add.ptr56, !dbg !1196  ; Compare dst against end - 2
    br i1 %cmp57, label %while.body, label %while.end, !dbg !1197, !llvm.loop !1205

After this patch, we see:

    %add.ptr56 = getelementptr inbounds i8, i8* %cond, i64 -2, !dbg !1195
    br i1 %cmp57212, label %while.body.preheader, label %while.end, !dbg !1197
  while.body.preheader:                             ; preds = %if.then48
    %scevgep228 = getelementptr i8, i8* %dst.addr.3, i64 3, !dbg !1197
    br label %while.body, !dbg !1197
  while.body:                                       ; preds = %while.body.preheader, %163
    %incdec.ptr65 = getelementptr inbounds i8, i8* %dst.addr.4214, i64 3, !dbg !1202
    %cmp57229 = icmp ult i8* %scevgep228, %add.ptr56, !dbg !1197
    br i1 %cmp57229, label %while.body, label %while.end, !dbg !1197, !llvm.loop !1205

If I'm reading the diff correctly, this patch adds a `while.body.preheader` BB for the SCEV GEP expression, and at the end compares that against `end - 2` to exit the loop. However, we jump back to the loop body, not the preheader, so the GEP expression is never updated. And hence, we end up with an infinite loop.

[1] https://github.com/mysql/mysql-server/blob/5.7/strings/ctype-utf8.c#L5217

  rG LLVM Github Monorepo



More information about the llvm-commits mailing list