[llvm-dev] [RFC] Supporting ARM's SVE in LLVM

Mon Nov 28 08:42:55 PST 2016

(This is somewhat of a digression from the topic of SVE, but...)

On Mon, Nov 28, 2016 at 8:09 AM, Bruce Hoult via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> "If the above holds true then the the length would be only variable
> between different hardware implementations.."
>
> This seems related to a problem that has independently hit several
> different projects around the world for a while now, and only recently have
> people understood what the problem is.
>
> These projects are all doing things that require cache invalidation, for
> example JIT compilation. They have hit a problem that the size of the cache
> block is changing unexpectedly underneath the program when the program is
> migrated from a big processor to a LITTLE. The program might start on a CPU
> with a 64 byte cache block and then suddenly find itself on a CPU with a 32
> byte cache block, but it's still doing cache flushes with a 64 byte stride.
> So half the cache blocks don't get flushed.
>
> As far as I'm aware, there is no defined time at which this happens. Maybe
> it could be between one instruction and the next! We don't even know a good
> way to enumerate all cache block sizes present in the system at runtime
> (and always use the smallest one as the stride). So we're for the moment
> hard-coding a value which we hope will always be small enough, and taking
> the (minor) hit from trying to flush the same cache block multiple times. A
> 32 byte stride, say, on a machine with 128 byte cache blocks is still a lot
> better than using a stride of 1 or 4 bytes.
>
> If there is a defined time when these changes can happen e.g. at a system
> call then we'd really love to know about it!
>
> Not having seen any actual designs for SVE It seems possible to me that
> the vector width could also change on migration between core types. So
> perhaps the answer is the same.
>

The cache-line-size issue I believe you're referring to was hardware errata
on a particular Samsung-designed core, not the way it is intended to work.
The reported cache-line size is intended to be the smallest possible value
across the system, but that particular CPU (Exynos 8890) was erroneously
reporting 128 for code running on the "big" Exynos-M1 core, and 64 for code
running on the "little" A53 core.

The ARM docs for the Cortex-A15 (
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0438d/BABHAEIF.html)
mention *exactly* this issue, and note that if you're mixing an A15 with a
small core (such as an A7), the designer must set the IMINLN signal on the
A15 to 0, to indicate that the A15 should also report a 32-byte cache-line
instead of its native 64-byte cache line.

Nothing about that issue is mentioned in the docs for ARMv8 cores, because,
at least so far, all the ARM-designed 64-bit CPUs have 64byte cache lines.
Obviously the same care ought to be taken if you change that property...but
unfortunately it was forgotten in this case.

In any case, that hardware defect has been worked around in linux 4.9
(116c81f427ff6c5380850963e3fb8798cc821d2b), and so it will now return a
consistent cache-line size even if the CPU has that error.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161128/931ae381/attachment.html>