[PATCH] D14489: [AArch64] Applying load pair optimization for volatile load/store

Tim Northover via llvm-commits llvm-commits at lists.llvm.org
Sat Nov 14 07:23:39 PST 2015


t.p.northover added a comment.

> To my knowledge, The ARM architecture is a weakly ordered memory architecture that supports out of order completion. (B2.7.3)


I think you're not realising just how special Device memory is. It's not just memory we happen to know is mapped to some peripheral, it's a real set of attributes you can put into the page tables that prevent the core from messing around with your memory accesses in various ways.

And since the primary valid use for volatile is accessing this kind of memory, we need to make sure we don't break it.

> For the ordering, (B2.7.2) [...] For Device Memory(B2.8.2), ARM sugessts using DMB, Load-Acquire, Store-Release(B2-87)


This is an incomplete picture. See B2-98, where it's explicitly called out that you need barriers for Device-nGRE memory. There would be no need to mention that if barriers were always needed.

I believe what actually happens is that Device-nGnRnE is completely ordered under all circumstances, Device-nGnrE is ordered within itself but can be reordered with accesses to Normal memory, and the others are even weaker.

I think we need volatile to be useful for the strongest of those memory regimes: Device-nGnRnE. That means we can't reorder anything and need to be careful about exceptions.

> I think most of ARM architecture, ldp's exectuion timing is same with single ldr instruction. so we don't have to worry about this so much.


I don't think latencies come into it. The question is what a conforming ARMv8 CPU is allowed to do, and I think aborting mid-ldp/stp is allowed.

Cheers.

Tim.


http://reviews.llvm.org/D14489





More information about the llvm-commits mailing list