[LLVMdev] About clock and wait instruction

Fri Dec 19 03:28:02 PST 2003

On Fri, 19 Dec 2003, Vipin Gokhale wrote:
> Perhaps "clock" is referring to things like reading CPU cycle counter on
> most modern processors (asm("rpcc %v0", foo) or __RPCC() compiler
> builtin on Alpha, e.g.); in the long term, a candidate for builtins I
> suspect.

Yes, that would make sense.

> While on the subject of builtins/asm etc, most modern CPUs also have
> instructions to do memory barriers/fences (i.e. stall the CPU until all
> in-flight memory loads and/or stores preceding the fence instruction
> have finished e.g., - may be that's what "wait" instruction in the
> subject line refers to ?).

Sure, ok.

> These are typically implemented as compiler builtins or asm in C. I do
> realize that anyone working on running existing code through LLVM can
> easily work around the current asm/builtin implementation for now by
> calling an assembly function, however, a perhaps not so obvious
> implication/intent in a memory fence like builtin is that the programmer
> also does not want compiler to reorder load/store instructions across
> the barrier. I do not see any mechanism in LLVM framework to express
> such a notion of barrier/fence or a mechanism to indicate that
> load/stores within what might otherwise look like a "normal" basic
> block, must not be reordered).

LLVM fully respects the notion that a call to an external function could
do just about anything, even when doing aggressive interprocedural
optimization.  However, if you call a function which could not possibly
read or write to a memory location (because it was allocated off the
heap/stack and whose address is not passed (possibly indirectly) into the
call), it will not guarantee that the store or load happens in the proper
order.  For this, you need...

> [ (a) Does LLVM understand 'volatile'
> attribute in C ? (b) My apologies in advance if I end up (or already
> have ?) "highjacking" this thread into another unrelated topic... ]

Yup, LLVM does fully support volatile.  Note that in 1.1 there was a bug
(PR179) where an optimization incorrectly eliminated volatile
loads/stores, but that is fixed, and will be in 1.2.  If you'd like the
fix, it's in CVS, or the patches are attached to the PR.

> May be an example (grossly simplified, but otherwise "real life") will
> help :
>
>      *old = *a->next_link;
>      *x = 1;       /* set a flag - indicates start of operation */
>      *a->next_link = *b->next_link;
>
>      asm("<store fence instruction>");
>
>      *x = 0;       /* reset the flag - done */
>
> Here, assume that (1) x, a, b and old are all (non-alias) addresses that
> map to a shared memory segment and/or execution environment for this
> code is multi-threaded - i.e. there's another thread of execution
> (watchdog) that the compiler may not be aware of, to which these memory
> writes are "visible".

There are two issues: the compiler and the processor.  If the loads/stores
are marked volatile, LLVM will not reorder them, so you've taken care of
the compiler side of things.  On the other hand, the processor (if it
doesn't have a strong consistency model) might reorder the accesses, so a
barrier/fence is still needed.  For this reason, a builting might be
appropriate.  Using an abstract builtin would allow writing generic code
that works on processors with difference consistency models, you would
just have to put fences in for the lowest-common-denominator (which is
still better than ifdefs! :).

> of redundant store operation. Another item that falls in this general
> category is code that uses setjmp/longjmp :
<snip>
> In the example above, if compiler doesn't understand the special
> semantics of setjmp, there's a potential for if (x == 1) block to get
> optimized incorrectly.

According to ANSI C, any variable live across a setjmp must be marked
volatile.  Of course this is silly and few people actually do that in
their code, but real compilers will break the code if you don't.
"Luckily," LLVM is _not_ one of these compilers.  It will correctly update
the variable, as it explicitly represents setjmp/longjmp using the same
mechanisms it uses for C++ EH.  In fact, in LLVM, longjmp and C++
destructors/cleanups even interact mostly correctly.

> My concern when one is dealing with a whole-program optimizer
> infrastructure like LLVM has been that it can easily (atleast in theory)
> see through this call-a-null-function trick... Yet, one could argue that
> there're plenty of legitimate optimization opportunities where memory
> references can be reordered, squashed, hoisted across basic blocks or
> even function calls (IOW turning off certain aggressive optimizations
> altogether might be a sledgehammer approach). I'm geting this nagging
> feeling that there may need to be a mechanism where special annotations
> need to be placed in LLVM instruction stream to ensure safe
> optimizations.... Someone please tell me my concerns are totally
> unfounded, atleast for LLVM :-)

Your concerns are totally unfounded, at least for LLVM.  :)  We fully
support volatile (even optimizing it away in some trivial cases where it
is obviously unneeded), and all of the IPO we do assumes an "open world".
That means that all of the optimizers are safe with partial programs or
libraries.  In fact, we run several of the optimizers (such as the dead
argument elimination and IP constant prop passes) at compile time as well
as at link time.  :)

That said, there is still room for improvement.  In particular, it would
make sense to add a small number of intrinsics for performing read/write
barriers and such.  The GNU hack of 'asm("", memory)' is really pretty
nasty.

-Chris

-- 
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/