[llvm] 07a5b84 - SelectionDAG: Fix bug in ClusterNeighboringLoads

Wed Feb 12 00:13:39 PST 2020

Author: Nicolai Hähnle
Date: 2020-02-12T09:12:55+01:00
New Revision: 07a5b849f7bcd4b99080298e93d05651aaaf4dcb

URL: https://github.com/llvm/llvm-project/commit/07a5b849f7bcd4b99080298e93d05651aaaf4dcb
DIFF: https://github.com/llvm/llvm-project/commit/07a5b849f7bcd4b99080298e93d05651aaaf4dcb.diff

LOG: SelectionDAG: Fix bug in ClusterNeighboringLoads

Summary:
The method attempts to find loads that can be legally clustered by
looking for loads consuming the same chain glue token.

However, the old code looks at _all_ users of values produced by the
chain node -- including uses of the loaded/returned value of volatile
loads or atomics. This could lead to circular dependencies which then
failed during scheduling.

With this change, we filter out users by getResNo, i.e. by which
SDValue value they use, to ensure that we only look at users of the
chain glue token.

This appears to be a rather old bug, which is perhaps surprising.
However, the test case is actually quite fragile (i.e., it is hidden
by fairly small changes), and the test _must_ use volatile loads for
the bug to manifest.

Reviewers: arsenm, bogner, craig.topper, foad

Subscribers: MatzeB, jvesely, wdng, hiraditya, javed.absar, jfb, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74253

Added: 
    llvm/test/CodeGen/AMDGPU/bug-sdag-scheduler-cycle.ll

Modified: 
    llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp

Removed: 
    


################################################################################
diff  --git a/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp b/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
index a297db178835..7bded0005a51 100644

--- a/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
@@ -198,10 +198,10 @@ static void RemoveUnusedGlue(SDNode *N, SelectionDAG *DAG) {
 /// outputs to ensure they are scheduled together and in order. This
 /// optimization may benefit some targets by improving cache locality.
 void ScheduleDAGSDNodes::ClusterNeighboringLoads(SDNode *Node) {
-  SDNode *Chain = nullptr;
+  SDValue Chain;
   unsigned NumOps = Node->getNumOperands();
   if (Node->getOperand(NumOps-1).getValueType() == MVT::Other)
-    Chain = Node->getOperand(NumOps-1).getNode();
+    Chain = Node->getOperand(NumOps-1);
   if (!Chain)
     return;
 
@@ -234,6 +234,9 @@ void ScheduleDAGSDNodes::ClusterNeighboringLoads(SDNode *Node) {
   unsigned UseCount = 0;
   for (SDNode::use_iterator I = Chain->use_begin(), E = Chain->use_end();
        I != E && UseCount < 100; ++I, ++UseCount) {
+    if (I.getUse().getResNo() != Chain.getResNo())
+      continue;
+
     SDNode *User = *I;
     if (User == Node || !Visited.insert(User).second)
       continue;

diff  --git a/llvm/test/CodeGen/AMDGPU/bug-sdag-scheduler-cycle.ll b/llvm/test/CodeGen/AMDGPU/bug-sdag-scheduler-cycle.ll
new file mode 100644
index 000000000000..50ba7e19f46e
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/bug-sdag-scheduler-cycle.ll
@@ -0,0 +1,27 @@
+; RUN: llc < %s -mtriple=amdgcn--amdpal -mcpu=gfx1010 -verify-machineinstrs | FileCheck %s --check-prefixes=CHECK
+
+; This used to cause a circular chain dependency during
+; SelectionDAG instruction scheduling.
+
+; CHECK-LABEL: {{^}}_amdgpu_gs_main:
+; CHECK: ds_read_b32
+; CHECK: ds_read_b32
+; CHECK: ds_read_b32
+; CHECK: ds_read_b32
+define amdgpu_gs float @_amdgpu_gs_main(i8 addrspace(3)* %arg0, i8 addrspace(3)* %arg1, i8 addrspace(3)* %arg2) #0 {
+  %tmp0 = bitcast i8 addrspace(3)* %arg0 to i32 addrspace(3)* addrspace(3)*
+  %tmp = load volatile i32 addrspace(3)*, i32 addrspace(3)* addrspace(3)* %tmp0, align 4
+
+  %tmp3 = load volatile i32, i32 addrspace(3)* %tmp, align 4
+
+  %tmp4a = bitcast i8 addrspace(3)* %arg1 to i32 addrspace(3)*
+  %tmp4 = load volatile i32, i32 addrspace(3)* %tmp4a, align 4
+
+  %tmp7a = getelementptr i32, i32 addrspace(3)* %tmp, i32 8
+  %tmp8 = load volatile i32, i32 addrspace(3)* %tmp7a, align 4
+
+  %tmp9 = add i32 %tmp3, %tmp8
+  %tmp10 = add i32 %tmp9, %tmp4
+  %tmp14 = bitcast i32 %tmp10 to float
+  ret float %tmp14
+}