[LLVMdev] Adjusting Load Latencies

Fri Mar 2 09:01:55 PST 2012

Hello,

I am interested in writing an analysis pass that looks at the stride
used for loads in a loop and passes that information down so that it
can be used by the instruction scheduler. The reason is that if the
load stride is greater than the cache line size, then I would expect
the load to always miss the cache, and, as a result, the scheduler
should use a much larger effective latency when scheduling the load and
its dependencies. Cache-miss metadata might also be a good supplemental
option. I can add methods to TLI that can convert the access stride
information into effective latency information, but what is the best
way to annotate the loads so that the information will be available to
the SDNodes?

Has anyone tried something like this before?

A related issue is automatically adding prefetching to loops. The
trick here is to accurately estimate the number of cycles the loop
body will take the execute (so that you prefetch the correct amount
ahead). This information is not really available until instruction
scheduling, and so prefetch adding cannot really complete until just
before MC generation (the prefetch instructions can be scheduled, but
their constant offset needs to be held free for a while). In addition,
estimating the number of cycles also requires relatively accurate
load/store latiencies, and this, in turn, requires cache-miss latencies
to be accounted for (which must then account for the prefetches).

If anyone has thoughts on these ideas, I would like to hear them.

Thanks again,
Hal

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory