[llvm-dev] RFC: System (cache, etc.) model for LLVM

Mon Nov 5 07:56:11 PST 2018

Renato Golin <renato.golin at linaro.org> writes:

> Mapping caches is certainly interesting to general architectures, but
> particularly important to massive operations like matrix multiply and
> stencils can pull a lot of data into cache and sometimes thrash it if
> not careful.

Exactly right.

> With scalable and larger vectors, this will be even more important.

True.

> Overall, I think this is a good idea, but the current proposal is too
> detailed on the implementation and not enough on the use for me to
> have a good idea how and where this will be used.
>
> Can you describe a few situations where these new interfaces would be
> used and how?

Sure.  The prefetching interfaces are already used, though in a
different form, by the LoopDataPrefetch pass.

The cache interfaces are flexible enough to allow passes to answer
questions like, "how much effective cache is available for this core
(thread, etc.)?"  That's a critical question to reason about the
thrashing behavior you mentioned above.

Knowing the cache line size is important for prefetching and various
other memory operations such as streaming.

Knowing the number of ways can allow one to guesstimate which memory
accesses are likely to collide in the cache.

It also happens that all of these parameters are useful for simulation
purposes, which may help projects like llvm-mca.

> On Thu, 1 Nov 2018 at 21:56, David Greene via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> Ok.  I would like to start posting patches for review without
>> speculating too much on fancy/exotic things that may come later.  We
>> shouldn't do anything that precludes extensions but I don't want to get
>> bogged down in a lot of details on things related to a small number of
>> targets.  Let's get the really common stuff in first.  What do you
>> think?
>
> In theory, both big and little cores should have the same cache
> structure, so we don't necessarily need extra descriptions for both.
>
> In practice, sub-architectures can have multiple combinations of
> big.LITTLE cores and it's simply not practical to add that to
> table-gen.

I'm not quite grasping this.  Are you saying that a partcular subtarget
may have multiple "clusters" of big.LITTLE cores and that each cluster
may look different from the others?

                                 -David