[LLVMdev] Latency of true depency of store followed by aliased load in ScheduleDAGInstrs

Tue Jun 12 07:20:40 PDT 2012

Hi all,

I have a question regarding the latency of the true dependency of a
store followed by an aliased load in ScheduleDAGInstrs. The latency
seems to depend on the store and load being volatile or not as can be
seen in the post-RA-sched debug output of the attached ARM example:

$ llc -O3 -debug-only=post-RA-sched store_load_latency_test.ll 

...

SU(2):   STRi12 %R2<kill>, %R0<kill>, 0, pred:14, pred:%noreg; mem:Volatile ST4[%p1](tbaa=!"int")
  # preds left       : 1
  # succs left       : 2
  # rdefs left       : 0
  Latency            : 1
  Depth              : 2
  Height             : 0
  Predecessors:
   val SU(1): Latency=1 Reg=%R2
  Successors:
   antiSU(3): Latency=0
   ch  SU(3): Latency=0

SU(3):   %R0<def> = LDRi12 %R1<kill>, 0, pred:14, pred:%noreg; mem:Volatile LD4[%p2](tbaa=!"int")
  # preds left       : 2
  # succs left       : 1
  # rdefs left       : 0
  Latency            : 1
  Depth              : 2
  Height             : 0
  Predecessors:
   antiSU(2): Latency=0
   ch  SU(2): Latency=0
  Successors:
   val SU(4294967295): Latency=2

...

SU(2):   STRi12 %R2<kill>, %R0<kill>, 0, pred:14, pred:%noreg; mem:ST4[%p1](tbaa=!"int")
  # preds left       : 1
  # succs left       : 2
  # rdefs left       : 0
  Latency            : 1
  Depth              : 2
  Height             : 0
  Predecessors:
   val SU(1): Latency=1 Reg=%R2
  Successors:
   antiSU(3): Latency=0
   ch  SU(3): Latency=1

SU(3):   %R0<def> = LDRi12 %R1<kill>, 0, pred:14, pred:%noreg; mem:LD4[%p2](tbaa=!"int")
  # preds left       : 2
  # succs left       : 1
  # rdefs left       : 0
  Latency            : 1
  Depth              : 3
  Height             : 0
  Predecessors:
   antiSU(2): Latency=0
   ch  SU(2): Latency=1
  Successors:
   val SU(4294967295): Latency=2

...

So in the volatile case the latency of the chain dependency is 0, while
in the non volatile case it is 1.

I am using ScheduleDAGInstrs in a scheduler for a VLIW target and in the
volatile case the load gets incorrectly scheduled in the same cycle as
the store. Is ScheduleDAGInstrs incorrect in the volatile case or
shouldn't I rely on the latency being non zero for getting a correct
schedule?

Best regards,

Jordy Potman
-------------- next part --------------
; ModuleID = 'store_load_latency_test.c'
target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:32:64-v128:32:128-a0:0:32-n32-S32"
target triple = "armv4t--"

define i32 @f1(i32* nocapture %p1, i32* nocapture %p2) nounwind {
entry:
  store volatile i32 65540, i32* %p1, align 4, !tbaa !0
  %0 = load volatile i32* %p2, align 4, !tbaa !0
  ret i32 %0
}

define i32 @f2(i32* nocapture %p1, i32* nocapture %p2) nounwind {
entry:
  store i32 65540, i32* %p1, align 4, !tbaa !0
  %0 = load i32* %p2, align 4, !tbaa !0
  ret i32 %0
}

!0 = metadata !{metadata !"int", metadata !1}
!1 = metadata !{metadata !"omnipotent char", metadata !2}
!2 = metadata !{metadata !"Simple C/C++ TBAA"}