[LLVMdev] Latency of true depency of store followed by aliased load in ScheduleDAGInstrs
Jordy Potman
jordy.potman at recoresystems.com
Tue Jun 12 07:20:40 PDT 2012
Hi all,
I have a question regarding the latency of the true dependency of a
store followed by an aliased load in ScheduleDAGInstrs. The latency
seems to depend on the store and load being volatile or not as can be
seen in the post-RA-sched debug output of the attached ARM example:
$ llc -O3 -debug-only=post-RA-sched store_load_latency_test.ll
...
SU(2): STRi12 %R2<kill>, %R0<kill>, 0, pred:14, pred:%noreg; mem:Volatile ST4[%p1](tbaa=!"int")
# preds left : 1
# succs left : 2
# rdefs left : 0
Latency : 1
Depth : 2
Height : 0
Predecessors:
val SU(1): Latency=1 Reg=%R2
Successors:
antiSU(3): Latency=0
ch SU(3): Latency=0
SU(3): %R0<def> = LDRi12 %R1<kill>, 0, pred:14, pred:%noreg; mem:Volatile LD4[%p2](tbaa=!"int")
# preds left : 2
# succs left : 1
# rdefs left : 0
Latency : 1
Depth : 2
Height : 0
Predecessors:
antiSU(2): Latency=0
ch SU(2): Latency=0
Successors:
val SU(4294967295): Latency=2
...
SU(2): STRi12 %R2<kill>, %R0<kill>, 0, pred:14, pred:%noreg; mem:ST4[%p1](tbaa=!"int")
# preds left : 1
# succs left : 2
# rdefs left : 0
Latency : 1
Depth : 2
Height : 0
Predecessors:
val SU(1): Latency=1 Reg=%R2
Successors:
antiSU(3): Latency=0
ch SU(3): Latency=1
SU(3): %R0<def> = LDRi12 %R1<kill>, 0, pred:14, pred:%noreg; mem:LD4[%p2](tbaa=!"int")
# preds left : 2
# succs left : 1
# rdefs left : 0
Latency : 1
Depth : 3
Height : 0
Predecessors:
antiSU(2): Latency=0
ch SU(2): Latency=1
Successors:
val SU(4294967295): Latency=2
...
So in the volatile case the latency of the chain dependency is 0, while
in the non volatile case it is 1.
I am using ScheduleDAGInstrs in a scheduler for a VLIW target and in the
volatile case the load gets incorrectly scheduled in the same cycle as
the store. Is ScheduleDAGInstrs incorrect in the volatile case or
shouldn't I rely on the latency being non zero for getting a correct
schedule?
Best regards,
Jordy Potman
-------------- next part --------------
; ModuleID = 'store_load_latency_test.c'
target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:32:64-v128:32:128-a0:0:32-n32-S32"
target triple = "armv4t--"
define i32 @f1(i32* nocapture %p1, i32* nocapture %p2) nounwind {
entry:
store volatile i32 65540, i32* %p1, align 4, !tbaa !0
%0 = load volatile i32* %p2, align 4, !tbaa !0
ret i32 %0
}
define i32 @f2(i32* nocapture %p1, i32* nocapture %p2) nounwind {
entry:
store i32 65540, i32* %p1, align 4, !tbaa !0
%0 = load i32* %p2, align 4, !tbaa !0
ret i32 %0
}
!0 = metadata !{metadata !"int", metadata !1}
!1 = metadata !{metadata !"omnipotent char", metadata !2}
!2 = metadata !{metadata !"Simple C/C++ TBAA"}
More information about the llvm-dev
mailing list