[LLVMdev] Question about load clustering in the machine scheduler
Tom Stellard
tom at stellard.net
Thu Mar 26 19:36:24 PDT 2015
Hi,
I have a program with over 100 loads (each with a 10 cycle latency)
at the beginning of the program, and I can't figure out how to get
the machine scheduler to intermix ALU instructions with the loads to
effectively hide the latency.
It seems the issue is with load clustering. I restrict load clustering
to 4 at a time, but when I look at the debug output, the loads are
always being scheduled based on the fact that that are clustered. e.g.
Pick Top CLUSTER
Scheduling SU(10) %vreg13<def> = S_BUFFER_LOAD_DWORD_IMM %vreg9, 4; mem:LD4[<unknown>] SGPR_32:%vreg13 SReg_128:%vreg9
I have a feeling there is something wrong with my machine model in the
R600 backend, but I've experimented with a few variations of it and have
been unable to solve this problem. Does anyone have any idea what I
might be doing wrong?
Here are my resource definitions from lib/Target/R600/SISchedule.td
// BufferSize = 0 means the processors are in-order.
let BufferSize = 0 in {
// XXX: Are the resource counts correct?
def HWBranch : ProcResource<1>;
def HWExport : ProcResource<7>; // Taken from S_WAITCNT
def HWLGKM : ProcResource<31>; // Taken from S_WAITCNT
def HWSALU : ProcResource<1>;
def HWVMEM : ProcResource<15>; // Taken from S_WAITCNT
def HWVALU : ProcResource<1>;
}
Thanks,
Tom
More information about the llvm-dev
mailing list