[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

Mon Jan 28 08:54:49 PST 2013

Hi All,

In the language reference manual, the access behavior of the memcpy,
memmove and memset intrinsics is not well defined with respect to the
volatile flag. The LRM even states that "it is unwise to depend on it".
This forces optimization passes to be conservatively correct and prevent
optimizations.

A very simple example of this is :

$ cat test.c 

#include <stdint.h>

struct R {
  uint16_t a;
  uint16_t b;
};

volatile struct R * const addr = (volatile struct R *) 416;

void test(uint16_t a)
{
  struct R r = { a, 1 };
  *addr = r;
}

$ clang -O2 -o - -emit-llvm -S -c test.c 
; ModuleID = 'test.c'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%struct.R = type { i16, i16 }

@addr = constant %struct.R* inttoptr (i64 416 to %struct.R*), align 8

define void @test(i16 zeroext %a) nounwind uwtable {
  %r.sroa.0 = alloca i16, align 2
  %r.sroa.1 = alloca i16, align 2
  store i16 %a, i16* %r.sroa.0, align 2
  store i16 1, i16* %r.sroa.1, align 2
  %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2
  store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), align 32
  %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2
  store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), align 2
  ret void
}

In the generated ir, the loads are marked as volatile. In this case,
this is due to the new SROA being conservatively correct. We should
however expect the test function to look like this much simpler one
(which was actually the case with llvm <= 3.1) :

define void @test(i16 zeroext %a) nounwind uwtable {
  store volatile i16 %a, i16* inttoptr (i64 416 to i16*), align 2
  store volatile i16 1, i16* inttoptr (i64 418 to i16*), align 2
  ret void
}

I propose to specify the volatile access behavior for those intrinsics :
instead of one flag, they should have 2 independant volatile flags, one
for destination accesses, the second for the source accesses.

If there is general agreement, I plan to proceed with the following steps :
1. Specify the access behavior (this patch).
2. Auto-upgrade the existing memcpy, memmove and memset intrinsics into
the more precise form by replicating the single volatile flag and rework
the MemIntrinsic hierarchy to provide (is|set)SrcVolatile(),
(is|set)DestVolatile() and implement (set|is)Volatile in terms of the
former 2 methods. This will conservatively preserve semantics. No
functional change so far. Commit 1 & 2.
3. Audit all uses of isVolatile() / setVolatile() and move them to the
more precise form. From this point, more aggressive / precise
optimizations can happen. Commit 3.
4. Teach clang to use the new form.
5. Optionally remove the old interface form,as there should be no
in-tree users left. This would however be an API change breaking
external code.

Cheers,

-- 
Arnaud de Grandmaison

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Specify-the-access-behaviour-of-the-memcpy-memmove-a.patch
Type: text/x-patch
Size: 7380 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130128/0d3ef5ec/attachment.bin>