[LLVMdev] Question about load clustering in the machine scheduler

Thu Mar 26 19:36:24 PDT 2015

Hi,

I have a program with over 100 loads (each with a 10 cycle latency)
at the beginning of the program, and I can't figure out how to get
the machine scheduler to intermix ALU instructions with the loads to
effectively hide the latency.

It seems the issue is with load clustering.  I restrict load clustering
to 4 at a time, but when I look at the debug output, the loads are
always being scheduled based on the fact that that are clustered. e.g.

Pick Top CLUSTER
Scheduling SU(10) %vreg13<def> = S_BUFFER_LOAD_DWORD_IMM %vreg9, 4; mem:LD4[<unknown>] SGPR_32:%vreg13 SReg_128:%vreg9

I have a feeling there is something wrong with my machine model in the
R600 backend, but I've experimented with a few variations of it and have
been unable to solve this problem.  Does anyone have any idea what I
might be doing wrong?

Here are my resource definitions from lib/Target/R600/SISchedule.td

// BufferSize = 0 means the processors are in-order.
let BufferSize = 0 in {

// XXX: Are the resource counts correct?
def HWBranch : ProcResource<1>;  
def HWExport : ProcResource<7>;   // Taken from S_WAITCNT
def HWLGKM   : ProcResource<31>;  // Taken from S_WAITCNT
def HWSALU   : ProcResource<1>;  
def HWVMEM   : ProcResource<15>;  // Taken from S_WAITCNT
def HWVALU   : ProcResource<1>;

}

Thanks,
Tom