[PATCH] Add a LOAD_SEQUENCE_POINT ISDOpcode
Richard Sandiford
rsandifo at linux.vnet.ibm.com
Thu Nov 14 01:54:57 PST 2013
Tom Stellard <tom at stellard.net> writes:
> On Wed, Nov 13, 2013 at 09:19:17AM -0800, Richard Sandiford wrote:
>> One unusual feature of the z architecture is that the result of a
> previous load can be reused indefinitely for subsequent loads, even if a
> cache-coherent store to that location is performed by another CPU. A
> special serialising instruction must be used if you want to force a load
> to be reattempted. To quote the architecture manual (where MVI is MOVE
> IMMEDIATE and CLI is COMPARE LOGICAL IMMEDIATE):
>
> We have a very similar 'feature' for the VLIW targets in the R600 backend.
> The result of a load from LDS (Local memory in OpenCL) is stored in the
> 'output queue'. When an ALU instructions wants to use the result of a
> load, it had two options: 1. It can read the value from the top of the
> queue and then pop it off. 2. It can read the value and leave it on
> the queue. If instructions use option 2, then the result of the load
> can be used indefinitely.
Ah, sounds like that might make the assembly pretty difficult to read :-)
The added complication for z is that it isn't defined whether and when
the reuse occurs, a bit like memory ordering isn't defined on weakly-ordered
machines (which z isn't).
>> /// Marks a point before a volatile or atomic load, to ensure that
>> /// subsequent loads are attempted. This exists for architectures
>> /// like SystemZ that allow results from previous loads to be reused
>> /// indefinitely. For example, the architecture may treat a loop:
>> ///
>> /// while (*i == 0);
>> ///
>> /// as:
>> ///
>> /// while (*i == 0) spin-until-interrupt;
>> ///
>> /// omitting all but the first load in each time slice (even if a
>> /// cache-coherent store is performed by another CPU). Inserting
>> /// this operation forces each iteration of the loop to attempt a load.
>> ///
>> /// Note that this is not an ordering fence per se. It simply ensures
>> /// that a sequence of N loads is not collapsed into 1 load.
>
> What would cause N loads to be collapsed into 1 load? Is this something
> the Legalizer might do?
Sorry, bad use of the passive tense there. I meant that the processor
can collapse N loads to 1 load at run time unless we use these sequence
points to stop it. I've just changed the comment to:
/// Marks a point before a volatile or atomic load, to ensure that
/// subsequent loads are attempted. This exists for architectures
/// like SystemZ that allow results from previous loads to be reused
/// indefinitely. For example, the architecture may treat a loop:
///
/// while (*i == 0);
///
/// as:
///
/// while (*i == 0) spin-until-interrupt;
///
/// omitting all but the first load in each time slice (even if a
/// cache-coherent store is performed by another CPU). Inserting
/// this operation forces each iteration of the loop to attempt a load.
///
/// Note that this is not an ordering fence per se. It simply prevents
/// the processor from collapsing a sequence of N loads into 1 load at
/// run time.
LOAD_SEQUENCE_POINT,
Thanks,
Richard
More information about the llvm-commits
mailing list