[llvm-dev] RFC: Extend PowerPC SW prefetching pass to other targets

Wed Dec 23 09:42:18 PST 2015

Hi,

I’d like to add SW prefetching capability for our ARM64 micro-architectures.  My immediate goal is to add support for constant large-strided accesses (>= 2KB) that are problematic for the HW prefetcher to handle.

The direct motivation is 433.milc in SPECfp 2006.  The benchmark iterates through very small matrices and multiplies them with a vector.  However the matrix is part of a large structure so the stride is large and as a result, we miss in L1 on every new matrix.

My plan is to take Hal’s PowerPC prefetching pass[1] and make it available for other targets on an opt-in basis.  Specifically, move the pass under  lib/Transform/Scalar and add a TTI interface to query the target parameters.  The  information a target would have to provide to opt in are: the stride threshold, cache line size and how many iterations ahead the prefetching should occur for a given loop.

For OOO architectures, the latter is pretty hard to estimate.  You pretty much have to compute II in the software pipelining sense.  I think that I will just use the instruction count to estimate a ResII with possibly checking that there are no recurrences in the loop other than the short ones for the induction variable.

This may err on the side of issuing the prefetches earlier than necessary but hopefully not too early to cause any cache thrashing.

The current pass operates on LLMV IR so besides having SCEVs to work with, we can also check recurrences across memory.

Please let me know if you have any comments.

Thanks,
Adam

[1] http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150216/260805.html <http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150216/260805.html>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151223/2b7900f8/attachment.html>