<p dir="ltr">So, here is the model that LLVM is using: a volatile memcpy is lowered to a loop of loads and stores of indeterminate width. As such, splitting a memcpy is always valid.</p>

<p dir="ltr">If we want a very specific load and store width for volatile accesses, I think that the frontend should generate concrete loads and stores of a type with that width. Ultimately, memcpy is a pretty bad model for *specific* width accesses, it is best at handling indeterminate sized accesses, which is exactly what doesn't make sense for device backed volatile accesses.</p>

<br><div class="gmail_quote"><div dir="ltr">On Wed, Nov 11, 2015, 10:00 Krzysztof Parzyszek <<a href="mailto:kparzysz@codeaurora.org">kparzysz@codeaurora.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 11/11/2015 8:53 AM, Hal Finkel wrote:<br>

><br>

> SROA seems to be doing a number of things here. What about if we prevented SROA from generating multiple slices splitting volatile accesses? There might be a significant difference between that and something like this test (test/Transforms/SROA/basictest.ll):<br>

><br>

> define i32 @test6() {<br>

> ; CHECK-LABEL: @test6(<br>

> ; CHECK: alloca i32<br>

> ; CHECK-NEXT: store volatile i32<br>

> ; CHECK-NEXT: load i32, i32*<br>

> ; CHECK-NEXT: ret i32<br>

><br>

> entry:<br>

>    %a = alloca [4 x i8]<br>

>    %ptr = getelementptr [4 x i8], [4 x i8]* %a, i32 0, i32 0<br>

>    call void @llvm.memset.p0i8.i32(i8* %ptr, i8 42, i32 4, i32 1, i1 true)<br>

>    %iptr = bitcast i8* %ptr to i32*<br>

>    %val = load i32, i32* %iptr<br>

>    ret i32 %val<br>

> }<br>

><br>

<br>

<br>

Yes, that would work.<br>

<br>

-Krzysztof<br>

<br>

--<br>

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,<br>

hosted by The Linux Foundation<br>

</blockquote></div>