[PATCH] D14489: [AArch64] Applying load pair optimization for volatile load/store
Junmo Park via llvm-commits
llvm-commits at lists.llvm.org
Fri Nov 13 22:06:06 PST 2015
flyingforyou added a comment.
This is good discusson.
Thanks Tim, Chad, Junbum.
To my knowledge, The ARM architecture is a weakly ordered memory architecture that supports out of order completion. (B2.7.3)
This means
ex1)
ldr x1, [x0] (volatile)
ldr x2, [x0, #8] (volatile)
ex2)
ldr x1, [x0, #8] (volatile)
ldr x2, [x0] (volatile)
Both of examples are not guaranteed which instruction is executed for the first time on the ARM Architecture.
So I think they can be merged into single ldp instruction.
>From the language side, volatile variable means it should be loaded and should be stored.
> https://en.wikipedia.org/wiki/Volatile_(computer_programming)
> In computer programming, particularly in the C, C++, C#, and Java programming languages, the volatile keyword indicates that a value may change between different accesses, even if it does not appear to be modified. This keyword prevents an optimizing compiler from optimizing away subsequent reads or writes and thus incorrectly reusing a stale value or omitting writes. Volatile values primarily arise in hardware access (memory-mapped I/O), where reading from or writing to memory is used to communicate with peripheral devices, and in threading, where a different thread may have modified a value.
For the ordering, (B2.7.2)
> Ordering can be achieved by using a DMB or DSB barrier. For more information on DMB and DSB
> instructions, see Memory barriers on page B2-85.
For Device Memory(B2.8.2), ARM sugessts using DMB, Load-Acquire, Store-Release(B2-87)
> Each Load-Acquire Exclusive and Store-Release Exclusive instruction is essentially a variant of the
> equivalent Load-Exclusive or Store-Exclusive instruction. All usage restrictions and single-copy atomicity
> properties:
> — That apply to the Load-Exclusive instructions also apply to the Load-Acquire Exclusive instructions.
> — That apply to the Store-Exclusive instructions also apply to the Store-Release Exclusive instructions.
> • The Load-Acquire/Store-Release instructions can remove the requirement to use the explicit DMB memory
> barrier instruction.
Tim's mention
> I think ldp/stp operations could be aborted like that (B2-81 specifically says they generate 2 single-copy atomic accesses)
I think most of ARM architecture, ldp's exectuion timing is same with single ldr instruction. so we don't have to worry about this so much.
Chad's mention
> I also wanted to point out that this patch is awesome in that it also allows instructions whos MMOs have been dropped to be merged (hasOrderedMemoryRef makes a conservative assumption when MMOs are missing). Therefore, this solution will transform more that just volatile loads/stores. Of course the correct fix is to probably not drop MMO (as is don't in tail merging)...
I agree with this. But before discuss this, I want to finish "volatile load/store merge".
I think that if nobody can agree with applying both of examples what I mentioned, this commit is not worth for non-volatile & volatile type merging.
Junmo.
http://reviews.llvm.org/D14489
More information about the llvm-commits
mailing list