[llvm-bugs] [Bug 40735] New: GVN fails to hoist loads from bitmasked pointers

via llvm-bugs llvm-bugs at lists.llvm.org
Thu Feb 14 17:24:10 PST 2019


https://bugs.llvm.org/show_bug.cgi?id=40735

            Bug ID: 40735
           Summary: GVN fails to hoist loads from bitmasked pointers
           Product: new-bugs
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: new bugs
          Assignee: unassignedbugs at nondot.org
          Reporter: atrick at apple.com
                CC: htmldeveloper at gmail.com, llvm-bugs at lists.llvm.org

Created attachment 21480
  --> https://bugs.llvm.org/attachment.cgi?id=21480&action=edit
bad-pregvn

I tried llvm-svn: 354078 (master on Feb 14 2019).

Swift stdlib types use tagged pointers for fast paths, bridging, etc. Simply
adding a pointer mask operation on the Swift references can cause 2X overhead
for common operations like array append (if we start using those bits) because
LLVM fails to optimize loads from that pointer.

In the attached examples, the function $s4good23run_ArrayAppendReservedyySiF
should have the storage.count hoisted out of a loop by GVN.

good-pregvn.ll is the IR that will be successfully hoisted
good-run-postgvn.ll is the IR for the relevant function after hoisting
bad-pregvn.ll is the IR that will not be successfully hoisted

The main difference is that in good-pregvn.ll we have a pointer to the
ContiguousArray reference:

.preheader:                                       ; preds = %2
  %5 = bitcast %swift.refcounted** %nums to
%Ts28__ContiguousArrayStorageBaseC**

; <label>:22:                                     ; preds = %31, %19
  %23 = phi i64 [ 0, %19 ], [ %32, %31 ]
  %24 = load %Ts28__ContiguousArrayStorageBaseC*,
%Ts28__ContiguousArrayStorageBaseC** %5, align 8
  %._storage2.count._value = getelementptr inbounds
%Ts28__ContiguousArrayStorageBaseC, %Ts28__ContiguousArrayStorageBaseC* %24,
i64 0, i32 1, i32 0, i32 0, i32 0
  %25 = load i64, i64* %._storage2.count._value, align 8, !range !13
  %26 = add nuw i64 %25, 1

In bad-pregvn.ll we have a pointer to an opaque i64:

.preheader:                                       ; preds = %2
  %5 = bitcast %swift.refcounted** %nums to i64*

; <label>:24:                                     ; preds = %35, %21
  %25 = phi i64 [ 0, %21 ], [ %36, %35 ]
  %26 = load i64, i64* %5, align 8
  %27 = and i64 %26, 72057594037927928
  %28 = inttoptr i64 %27 to %Ts28__ContiguousArrayStorageBaseC*
  %._storage3.count._value = getelementptr inbounds
%Ts28__ContiguousArrayStorageBaseC, %Ts28__ContiguousArrayStorageBaseC* %28,
i64 0, i32 1, i32 0, i32 0, i32 0
  %29 = load i64, i64* %._storage3.count._value, align 8, !range !13
  %30 = add nuw i64 %29, 1

Running

$ opt ./good-pregvn.ll -S -gvn

Hoists the both the load of the object pointer:
 %26 = load i64, i64* %5, align 8

replacing it with a phi:
  %.pre = phi %Ts28__ContiguousArrayStorageBaseC* [ %.pre.pre, %15 ], [ %14, %9
]

*AND* hoists the dependent load into the inner loop preheader:

  %._storage2.count._value.phi.trans.insert = getelementptr inbounds
%Ts28__ContiguousArrayStorageBaseC, %Ts28__ContiguousArrayStorageBaseC* %.pre,
i64 0, i32 1, i32 0, i32 0, i32 0
  %.pre4 = load i64, i64* %._storage2.count._value.phi.trans.insert, align 8,
!range !13

Resulting in this nice preheader and inner loop header:

; <label>:9:                                      ; preds = %6
  %12 = load %swift.refcounted*, %swift.refcounted** %nums, align 8
  %14 = bitcast %swift.refcounted* %12 to %Ts28__ContiguousArrayStorageBaseC*

; <label>:17:                                     ; preds = %15, %9
  %.pre = phi %Ts28__ContiguousArrayStorageBaseC* [ %.pre.pre, %15 ], [ %14, %9
]
  %._storage2.count._value.phi.trans.insert = getelementptr inbounds
%Ts28__ContiguousArrayStorageBaseC, %Ts28__ContiguousArrayStorageBaseC* %.pre,
i64 0, i32 1, i32 0, i32 0, i32 0
  %.pre1 = load i64, i64* %._storage2.count._value.phi.trans.insert, align 8,
!range !13
  br label %20

; <label>:20:                                     ; preds = %30, %17
  %21 = phi %Ts28__ContiguousArrayStorageBaseC* [ %.pre, %17 ], [ %31, %30 ]
  %22 = phi i64 [ %.pre1, %17 ], [ %25, %30 ]
  %23 = phi %Ts28__ContiguousArrayStorageBaseC* [ %.pre, %17 ], [ %31, %30 ]
  %24 = phi i64 [ 0, %17 ], [ %32, %30 ]
  %._storage2.count._value = getelementptr inbounds
%Ts28__ContiguousArrayStorageBaseC, %Ts28__ContiguousArrayStorageBaseC* %23,
i64 0, i32 1, i32 0, i32 0, i32 0
  %25 = add nuw i64 %22, 1
  %._storage2._capacityAndFlags._value = getelementptr inbounds
%Ts28__ContiguousArrayStorageBaseC, %Ts28__ContiguousArrayStorageBaseC* %23,
i64 0, i32 1, i32 0, i32 1, i32 0
  %26 = load i64, i64* %._storage2._capacityAndFlags._value, align 8
  %27 = lshr i64 %26, 1
  %28 = icmp slt i64 %27, %25
  br i1 %28, label %29, label %30, !prof !14

Running GVN on the bitmasked code, no such optimization happens:

$ opt ./bad-pregvn.ll -S -gvn

The inner loop header still loads the object reference (as i64*) then the
count:

; <label>:22:                                     ; preds = %34, %19
  %23 = phi i64 [ 0, %19 ], [ %36, %34 ]
  %24 = load i64, i64* %5, align 8
  %25 = and i64 %24, 72057594037927928
  %26 = inttoptr i64 %25 to %Ts28__ContiguousArrayStorageBaseC*
  %._storage3.count._value = getelementptr inbounds
%Ts28__ContiguousArrayStorageBaseC, %Ts28__ContiguousArrayStorageBaseC* %26,
i64 0, i32 1, i32 0, i32 0, i32 0
  %27 = load i64, i64* %._storage3.count._value, align 8, !range !13
  %28 = add nuw i64 %27, 1

:-( :-( :-(

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20190215/059622c7/attachment.html>


More information about the llvm-bugs mailing list