[LLVMdev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)

Arnaud de Grandmaison arnaud.allarddegrandmaison at parrot.com
Sun Jan 20 13:42:51 PST 2013


As a results of my investigations, the thread is also added to cfe-dev.

The context : while porting my company code from the LLVM/Clang releases
3.1 to 3.2, I stumbled on a code size and performance regression. The
testcase is :

$ cat test.c 
#include <stdint.h>

struct R {
  uint16_t a;
  uint16_t b;
};

volatile struct R * const addr = (volatile struct R *) 416;

void test(uint16_t a)
{
  struct R r = { a, 1 };
  *addr = r;
}

$ clang -O2 -o - -emit-llvm -S -c test.c 
; ModuleID = 'test.c'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%struct.R = type { i16, i16 }

@addr = constant %struct.R* inttoptr (i64 416 to %struct.R*), align 8

define void @test(i16 zeroext %a) nounwind uwtable {
  %r.sroa.0 = alloca i16, align 2
  %r.sroa.1 = alloca i16, align 2
  store i16 %a, i16* %r.sroa.0, align 2
  store i16 1, i16* %r.sroa.1, align 2
  %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2
  store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), align 32
  %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2
  store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), align 2
  ret void
}

The problem is in how clang codegen aggregate copies when the source or
the destination are volatile : it uses the memcpy intrinsic, setting the
volatile parameter. However, the LLVM IR reference manual specifies that
"The detailed access behavior is not very cleanly specified and it is
unwise to depend on it". The new SROA implementation is conservatively
correct, but behaves differently.

I believe we should either :
 - rework methods AggExprEmitter::EmitCopy and
CodeGenFunction::EmitAggregateCopy (from
clang/lib/CodeGen/CGExprAgg.cpp) to codegen differently the copy when
either the source or the destination is volatile,
 - specify cleanly the memcpy intrinsic with respect to volatile behaviour

I would prefer option #2 as having a parameter whose effect is
unspecified does not make it very useful and invite more problems later.

Any thoughts ?

Cheers,
--
Arnaud A. de Grandmaison


On 01/18/2013 06:30 PM, Arnaud A. de Grandmaison wrote:
> Hi All,
>
> Using clang+llvm at head, I noticed a weird behaviour with the following
> reduced testcase :
>
> $ cat test.c 
>
> #include <stdint.h>
>
> struct R {
>   uint16_t a;
>   uint16_t b;
> };
>
> volatile struct R * const addr = (volatile struct R *) 416;
>
> void test(uint16_t a)
> {
>   struct R r = { a, 1 };
>   *addr = r;
> }
>
> $ clang -O2 -o - -emit-llvm -S -c test.c 
> ; ModuleID = 'test.c'
> target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
> target triple = "x86_64-unknown-linux-gnu"
>
> %struct.R = type { i16, i16 }
>
> @addr = constant %struct.R* inttoptr (i64 416 to %struct.R*), align 8
>
> define void @test(i16 zeroext %a) nounwind uwtable {
>   %r.sroa.0 = alloca i16, align 2
>   %r.sroa.1 = alloca i16, align 2
>   store i16 %a, i16* %r.sroa.0, align 2
>   store i16 1, i16* %r.sroa.1, align 2
>   %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2
>   store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), align 32
>   %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2
>   store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), align 2
>   ret void
> }
>
>
> When I would have expected something like :
>
> define void @test(i16 zeroext %a) nounwind uwtable {
>   store volatile i16 %a, i16* inttoptr (i64 416 to i16*), align 2
>   store volatile i16 1, i16* inttoptr (i64 418 to i16*), align 2
>   ret void
> }
>
> Using LLVM-3.1, I get the expected code. Would this ring a bell to anybody ?
>
> Cheers,
>


-- 
Arnaud de Grandmaison
Senior CPU engineer
Business Unit Digital Tuner

Parrot S.A.
174, quai de Jemmapes
75010 Paris - France
Phone: +33 1 48 03 84 59




More information about the llvm-dev mailing list