[PATCH] D15330: Fix cycle in selection DAG introduced by extractelement legalization

Robert Lougher via llvm-commits llvm-commits at lists.llvm.org
Tue Dec 8 04:32:58 PST 2015


rob.lougher created this revision.
rob.lougher added reviewers: hfinkel, ab.
rob.lougher added a subscriber: llvm-commits.

Compiling the following code with optimization will cause the compiler to fail with an assertion.

```
typedef float alpha __attribute__((__vector_size__(16)));

float foo(int& i, alpha& v) {
  v *= v;
  return v[i];
}
```

Overran sorted position:
t15: ch,glue = CopyToReg t22:1, Register:f32 %XMM0, t22
  t14: f32 = Register %XMM0
  t22: f32,ch = load<LD4[<unknown>]> t9, t21, undef:i64
    t21: i64 = add t24, t4
      t24: i64 = shl t17, Constant:i8<2>
        t17: i64,ch = load<LD4[%i](tbaa=<0x5eab5d8>), sext from i32> t22:1, t2, undef:i64
          t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
            t1: i64 = Register %vreg0
          t6: i64 = undef
        t23: i8 = Constant<2>
      t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
        t3: i64 = Register %vreg1
    t6: i64 = undef
Checking if this is due to cycles
Detected cycle in SelectionDAG
Offending node:
t22: f32,ch = load<LD4[<unknown>]> t9, t21, undef:i64
  t21: i64 = add t24, t4
    t24: i64 = shl t17, Constant:i8<2>
      t17: i64,ch = load<LD4[%i](tbaa=<0x5eab5d8>), sext from i32> t22:1, t2, undef:i64
        t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
          t1: i64 = Register %vreg0
        t6: i64 = undef
      t23: i8 = Constant<2>
    t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
      t3: i64 = Register %vreg1
  t6: i64 = undef


This is the IR for the test case:

```
  %1 = load <4 x float>, <4 x float>* %v, align 16
  %mul = fmul <4 x float> %1, %1
  store <4 x float> %mul, <4 x float>* %v, align 16
  %2 = load i32, i32* %i, align 4
  %vecext = extractelement <4 x float> %mul, i32 %2
  ret float %vecext
```

During selection DAG legalization, the extractelement is replaced with a load instruction. To do this, a temporary store to the stack is used unless an existing store is found that can be re-used.  In this case, the store is found.

After creating the load the code replaces the chain going out of the store with the one going out of the new load.  The code does this to ensure that any stores that must take place after the store happens after the load, else the value might be overwritten before it is loaded.

The problem is, if the extractelement index is dependent on the store replacing the chain will introduce a cycle in the selection DAG (the load uses the index, and by replacing the chain we will make the index dependent on the load).

This patch is a simple fix for the problem, and adds an additional dependency test when searching for a store.  This is conservative as we may end up creating an unnecessary extra store to the stack if another store cannot be found (as in this case). However, the logic required to calculate the correct dependency order is complex (see below), and the situation is not expected to occur very often.

Opinions on the patch are welcome.  If the view is that the conservative approach is not appropriate I can try and implement the logic below and submit that instead.

**Non-conservative fix**

The change that introduced the problem is r231721 which added the code to update the dependency chain.  As already stated, previously there was no dependency between the new load and any other store to the same location, meaning the load could occur after the store, getting a wrong value.

The problem is, the change always replaces the chain irrespective of whether there are any other stores.  In the testcase above we have no additional stores and so the dependency chain does not need to be changed.  However, it isn't as easy as simply scanning the uses of the re-used store and only updating the stores as we may have a load which is followed in the chain by a store.  This is demonstrated by the testcase for r231721 which has multiple extractelements.  As soon as an extractelement is rewritten into a load, the re-used store becomes followed in the chain by a load and then a store (all 4 extractelements end up chained between the original stores).

So we need to traverse the dependency chain looking for stores, and only replace the chain if a store occurs.  However, this is also not sufficient.  We can modify the testcase above:

```
typedef float alpha __attribute__((__vector_size__(16)));

float foo(int& i, alpha& v) {
  v *= v;
  return v[i++];
}
```

To create the following selection DAG:

```
Initial selection DAG: BB#0 '_Z3fooRiRDv4_f:entry'
SelectionDAG has 20 nodes:
  t0: ch = EntryToken
  t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
  t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
  t5: i64 = Constant<0>
  t7: v4f32,ch = load<LD16[%v](tbaa=<0x5990268>)> t0, t4, undef:i64
  t8: v4f32 = fmul t7, t7
    t9: ch = store<ST16[%v](tbaa=<0x5990268>)> t7:1, t8, t4, undef:i64
  t10: i32,ch = load<LD4[%i](tbaa=<0x59926b8>)> t9, t2, undef:i64
      t12: i32 = add t10, Constant:i32<1>
    t13: ch = store<ST4[%i](tbaa=<0x59926b8>)> t10:1, t12, t2, undef:i64
      t14: i64 = sign_extend t10
    t15: f32 = extract_vector_elt t8, t14
  t18: ch,glue = CopyToReg t13, Register:f32 %XMM0, t15
  t19: ch = X86ISD::RET_FLAG t18, TargetConstant:i16<0>, Register:f32 %XMM0, t18:1
```

The store (t9) that will be re-used for the extract_vector_elt is now followed in the dependency chain by a load and a store (t9->t10->t13).  By the previous logic we need to replace the chain going out of the store.  However, this will get the cycle in the DAG as in the original testcase, so we haven't solved the problem.

In this case we can change the dependency chain and insert the new load between t10 and t13.  The new load will occur before the store (t13) maintaining the store ordering but after the load so avoiding the cycle in the DAG.

The problem is, I can see that it is theoretically possible to have multiple chains emanating from the re-used store all of which have a problematic load/store.  We can't fix all of these chains by inserting the new load between the load and the store.


http://reviews.llvm.org/D15330

Files:
  lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
  test/CodeGen/X86/extractelement-legalization-cycle.ll

Index: test/CodeGen/X86/extractelement-legalization-cycle.ll
===================================================================
--- test/CodeGen/X86/extractelement-legalization-cycle.ll
+++ test/CodeGen/X86/extractelement-legalization-cycle.ll
@@ -0,0 +1,21 @@
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown | FileCheck %s
+
+; When the extractelement is converted to a load the store can be re-used.
+; This will, however, introduce a cycle into the selection DAG (the load
+; of the extractelement index is dependent on the store, and so after the
+; conversion it becomes dependent on the new load, which is dependent on
+; the index).  Make sure we skip the store, and conservatively instead
+; use a store to the stack.
+
+define float @foo(i32* %i, <4 x float>* %v) {
+; CHECK-LABEL: foo:
+; CHECK:    movaps %xmm0, -[[OFFSET:[0-9]+]](%rsp)
+; CHECK:    movss -[[OFFSET]](%rsp,{{.*}}), %xmm0 {{.*}}
+; CHECK-NEXT:    retq
+  %1 = load <4 x float>, <4 x float>* %v, align 16
+  %mul = fmul <4 x float> %1, %1
+  store <4 x float> %mul, <4 x float>* %v, align 16
+  %2 = load i32, i32* %i, align 4
+  %vecext = extractelement <4 x float> %mul, i32 %2
+  ret float %vecext
+}
Index: lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
===================================================================
--- lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -1477,6 +1477,12 @@
       if (!ST->getChain().reachesChainWithoutSideEffects(DAG.getEntryNode()))
         continue;
 
+      // If the index is dependent on the store we will introduce a cycle when
+      // creating the load (the load uses the index, and by replacing the chain
+      // we will make the index dependent on the load).
+      if (Idx.getNode()->hasPredecessor(ST))
+        continue;
+
       StackPtr = ST->getBasePtr();
       Ch = SDValue(ST, 0);
       break;


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D15330.42160.patch
Type: text/x-patch
Size: 1874 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151208/e89f088b/attachment.bin>


More information about the llvm-commits mailing list