[PATCH] D27133: Introduce element-wise atomic memcpy and memmove intrinsics

Thu Dec 22 11:11:02 PST 2016

efriedma added a comment.

Have you considered instead of specifying a single `__llvm_memcpy_element_atomic` library function, specifying a set?  (`__llvm_memcpy_element_atomic_1`, `__llvm_memcpy_element_atomic_2`, `__llvm_memcpy_element_atomic_4`, `__llvm_memcpy_element_atomic_8`, `__llvm_memcpy_element_atomic_16`).  It should be a bit faster, and if someone screws up, you'll get a link error rather than a runtime error.

It's kind of hard to for me to judge how useful this is because clang doesn't use unordered loads/stores... but I can definitely see this being nice to have.

> In general, we want to implement two families of optimizations:
> 
> - target function recognition to intrinsic matching
> - lowering of intrinsics to IR for small sizes
> 
> We need the volatile flag to know that the later isn't legal.

I don't follow; what exactly is "volatile" supposed to mean here?  memcpy only has a volatile bit to match C semantics for volatile structs; since you're not dealing with that legacy, you should call your bit something different to reflect how you actually expect it to be used (maybe noinline, if that's what you're after?).

https://reviews.llvm.org/D27133