[LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)

Mon Jan 21 02:15:28 PST 2013

On 01/20/2013 10:56 PM, Chandler Carruth wrote:
> I doubt you needed to add cfe-dev here. Sorry I hadn't seen this, this
> seems like an easy and simple deficiency in the IR intrinsic for
> memcpy. See below.
>
> On Sun, Jan 20, 2013 at 1:42 PM, Arnaud de Grandmaison
> <arnaud.allarddegrandmaison at parrot.com
> <mailto:arnaud.allarddegrandmaison at parrot.com>> wrote:
>
>     define void @test(i16 zeroext %a) nounwind uwtable {
>       %r.sroa.0 = alloca i16, align 2
>       %r.sroa.1 = alloca i16, align 2
>       store i16 %a, i16* %r.sroa.0, align 2
>       store i16 1, i16* %r.sroa.1, align 2
>       %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2
>       store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to
>     i16*), align 32
>       %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2
>       store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to
>     i16*), align 2
>
>
> Just so we're on the same page, the "regression" (which I claim is
> actually a bug-fix) are the loads here being volatile in addition to
> the stores. The stores require it, but the loads get it.

We are on the same page there. And although this shows as a performance
regression with my targets, I agree this is a bug fix, especially given
the actual semantics of the memcpy intrinsic.

>
>     The problem is in how clang codegen aggregate copies when the
>     source or
>     the destination are volatile : it uses the memcpy intrinsic,
>     setting the
>     volatile parameter. However, the LLVM IR reference manual
>     specifies that
>     "The detailed access behavior is not very cleanly specified and it is
>     unwise to depend on it". The new SROA implementation is conservatively
>     correct, but behaves differently.
>
>     I believe we should either :
>      - rework methods AggExprEmitter::EmitCopy and
>     CodeGenFunction::EmitAggregateCopy (from
>     clang/lib/CodeGen/CGExprAgg.cpp) to codegen differently the copy when
>     either the source or the destination is volatile,
>
>
> I think this is bad...
>  
>
>      - specify cleanly the memcpy intrinsic with respect to volatile
>     behaviour
>
>
> I agree that this is what we should do.
>
> I took a conservative approach when implementing SROA w.r.t. volatile
> memcpy, and I think we should codify those semantics: with a single
> is-volatile bit both the loads and stores which the memcpy decomposes
> into will have volatile semantics.
>
> I think we should extend the memcpy intrinsic to support two volatile
> flags and have that overload have the flags refer to the loads and
> stores respectively. We should then auto-upgrade the existing memcpy
> intrinsics into the more precise form by replicating the single
> volatile flag. This will conservatively preserve semantics. As part of
> adding the new intrinsic, everywhere in LLVM that examines the two
> flags should be audited and should use the individual flag for load
> vs. store volatility. This will change SROA so that it can emit the
> loads as non-volatile, and only the stores will be volatile. Finally
> we switch Clang over to emit the new intrinsic structure with more
> precise information, and your test case should go back to normal.
>
> As part of this, we should also clarify that a volatile memcpy
> guarantees that each byte loaded is loaded exactly once, and each byte
> stored is stored exactly once, but that these operations are free to
> be batched into 4-byte, 8-byte, or any other width of loads and
> stores, and there is no guarantee about the interleaving of these
> loads and stores even when both are volatile.
>
> -Chandler

I am volunteering to implement this. I have to finish my code porting to
llvm-3.2 first ; expect some patches / discussions by the end of the week.

Cheers,

-- 
Arnaud de Grandmaison

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130121/8f94815b/attachment.html>