<div dir="ltr">(This is somewhat of a digression from the topic of SVE, but...)<span class="gmail-im"><br style="font-size:12.8px"></span><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Nov 28, 2016 at 8:09 AM, Bruce Hoult via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><span class="gmail-">"If the above holds true then the the length would be only variable<div>between different hardware implementations.."</div><div><br></div></span><div>This seems related to a problem that has independently hit several different projects around the world for a while now, and only recently have people understood what the problem is.</div><div><br></div><div>These projects are all doing things that require cache invalidation, for example JIT compilation. They have hit a problem that the size of the cache block is changing unexpectedly underneath the program when the program is migrated from a big processor to a LITTLE. The program might start on a CPU with a 64 byte cache block and then suddenly find itself on a CPU with a 32 byte cache block, but it's still doing cache flushes with a 64 byte stride. So half the cache blocks don't get flushed.</div><div><br></div><div>As far as I'm aware, there is no defined time at which this happens. Maybe it could be between one instruction and the next! We don't even know a good way to enumerate all cache block sizes present in the system at runtime (and always use the smallest one as the stride). So we're for the moment hard-coding a value which we hope will always be small enough, and taking the (minor) hit from trying to flush the same cache block multiple times. A 32 byte stride, say, on a machine with 128 byte cache blocks is still a lot better than using a stride of 1 or 4 bytes.</div><div><br></div><div>If there is a defined time when these changes can happen e.g. at a system call then we'd really love to know about it!</div><div><br></div><div>Not having seen any actual designs for SVE It seems possible to me that the vector width could also change on migration between core types. So perhaps the answer is the same.</div></div></blockquote><div><br></div><div>The cache-line-size issue I believe you're referring to was hardware errata on a particular Samsung-designed core, not the way it is intended to work. The reported cache-line size is intended to be the smallest possible value across the system, but that particular CPU (Exynos 8890) was erroneously reporting 128 for code running on the "big" Exynos-M1 core, and 64 for code running on the "little" A53 core.</div><div><br></div><div>The ARM docs for the Cortex-A15 (<a href="http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0438d/BABHAEIF.html">http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0438d/BABHAEIF.html</a>) mention *exactly* this issue, and note that if you're mixing an A15 with a small core (such as an A7), the designer must set the IMINLN signal on the A15 to 0, to indicate that the A15 should also report a 32-byte cache-line instead of its native 64-byte cache line.</div><div><br></div><div>Nothing about that issue is mentioned in the docs for ARMv8 cores, because, at least so far, all the ARM-designed 64-bit CPUs have 64byte cache lines. Obviously the same care ought to be taken if you change that property...but unfortunately it was forgotten in this case.</div><div><br></div><div>In any case, that hardware defect has been worked around in linux 4.9 (116c81f427ff6c5380850963e3fb8798cc821d2b), and so it will now return a consistent cache-line size even if the CPU has that error.</div><div> </div></div></div></div>