[LLVMdev] [RFC] Add compiler scheduling barriers

Mon Jun 23 17:55:34 PDT 2014

On 06/19/2014 09:35 AM, Yi Kong wrote:
> Hi all,
>
> I'm currently working on implementing ACLE extensions for ARM. There
> are some memory barrier intrinsics, i.e.__dsb and __isb that require
> the compiler not to reorder instructions around their corresponding
> built-in intrinsics(__builtin_arm_dsb, __builtin_arm_isb), including
> non-memory-access instructions.[1] This is currently not possible.
>
> It is sometimes useful to prevent the compiler from reordering
> memory-access instructions as well. The only way to do that in both
> GCC and LLVM is using a in-line assembly hack:
>    asm volatile("" ::: "memory")
>
> I propose adding two compiler scheduling barriers intrinsics to LLVM:
> __schedule_barrier_memory and __schedule_barrier_full. The former only
> prevents memory-access instructions reordering around the instruction
> and the latter stops all. So that __isb, for example, can be
> implemented something like:
>    inline void __isb() {
>      __schedule_barrier_full();
>      __builtin_arm_isb();
>      __schedule_barrier_full();
>    }
Given your examples are in C, I want to ask a clarification question.  
Are you proposing adding such intrinsics to the LLVM IR? Or to some 
runtime library?  If the later, *specifically* which one? Or at the 
MachineInst layer?

I'm going to run under the assumption you're using C pseudo code for 
IR.  If this is not the case, the rest of this will be off base.

I'm not familiar with the exact semantics of an "isb" barrier, but I 
think you should look at the existing fence IR instructions.  These 
restrict memory reorderings in the IR.  Depending on the platform, they 
may imply hardware barriers, but they always imply compiler barriers.

If all you want is a compiler barrier with the existing fence semantics 
w.r.t. reordering, we could consider extending fence with a "compiler 
only" (bikeshed needed!) attribute.

If you're describing a new memory ordering for existing fences, that 
would seem like a reasonable extension.

I'm not familiar with how we currently handle intrinsics for 
architecture specific memory barriers.  Can anyone else comment on 
that?  Is there a way to tag a particular intrinsic function as *also* 
being a full fence?

>
> To implement these intrinsics, I think the best method is to add
> target-independent pseudo-instructions with appropriate
> properties(hasSideEffects for memory barrier and isTerminator for full
> barrier) and a pseudo-instruction elimination pass after the
> scheduling pass.
Why would your barrier need to be a basic block terminator?  That 
doesn't parse for me.  Could you explain?
>
> What do people think of this idea?
I'm honestly unclear on what your problem is and what you're trying to 
propose.  It make take a few rounds of conversation to clarify.

Philip