[llvm-dev] SROA and volatile memcpy/memset

Hal Finkel via llvm-dev llvm-dev at lists.llvm.org
Wed Nov 11 06:53:47 PST 2015


----- Original Message -----
> From: "Krzysztof Parzyszek via llvm-dev" <llvm-dev at lists.llvm.org>
> To: llvm-dev at lists.llvm.org
> Sent: Tuesday, November 10, 2015 1:22:57 PM
> Subject: Re: [llvm-dev] SROA and volatile memcpy/memset
> 
> On 11/10/2015 1:07 PM, Joerg Sonnenberger via llvm-dev wrote:
> > On Tue, Nov 10, 2015 at 10:41:06AM -0600, Krzysztof Parzyszek via
> > llvm-dev wrote:
> >> I have a customer testcase where SROA splits a volatile memcpy and
> >> we end up
> >> generating bad code[1].  While this looks like a bug, simply
> >> preventing SROA
> >> from splitting volatile memory intrinsics causes basictest.ll for
> >> SROA to
> >> fail.  Not only that, but it also seems like handling of volatile
> >> memory
> >> transfers was done with some intent.
> >
> > There is no such thing as a volatile memcpy or memset in standard
> > ISO C,
> > so what exactly are you doing and why do you expect it to work that
> > way?
> 
> The motivating example has an aggregate copy where the aggregate is
> volatile, followed by a store to one of its members. (This does not
> have
> anything to do with devices.) SROA expanded this into a series of
> volatile loads and stores, which cannot be coalesced back into fewer
> instructions. This is clearly worse than doing the copy and then the
> member overwrite.
> 
> --- test.c ---
> typedef struct {
>    volatile unsigned int value;
> } atomic_word_t;
> 
> typedef union {
>    struct {
>      unsigned char state;
>      unsigned char priority;
>    };
>    atomic_word_t atomic;
>    unsigned int full;
> } mystruct_t;
> 
> 
> mystruct_t a;
> 
> unsigned int foo(void) {
>    mystruct_t x;
>    mystruct_t y;
> 
>    x.full = a.atomic.value;
>    y = x;
>    y.priority = 7;
> 
>    return y.full;
> }
> --------------

SROA seems to be doing a number of things here. What about if we prevented SROA from generating multiple slices splitting volatile accesses? There might be a significant difference between that and something like this test (test/Transforms/SROA/basictest.ll):

define i32 @test6() {
; CHECK-LABEL: @test6(
; CHECK: alloca i32
; CHECK-NEXT: store volatile i32
; CHECK-NEXT: load i32, i32*
; CHECK-NEXT: ret i32

entry:
  %a = alloca [4 x i8]
  %ptr = getelementptr [4 x i8], [4 x i8]* %a, i32 0, i32 0
  call void @llvm.memset.p0i8.i32(i8* %ptr, i8 42, i32 4, i32 1, i1 true)
  %iptr = bitcast i8* %ptr to i32*
  %val = load i32, i32* %iptr
  ret i32 %val
}

 -Hal

> 
> 
> -Krzysztof
> 
> 
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> hosted by The Linux Foundation
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory


More information about the llvm-dev mailing list