[llvm-dev] [frontend-dev][beginner] Allocation of structures

Wed May 31 14:32:30 PDT 2017

On Mon, May 29, 2017 at 6:14 PM, Dimitri Racordon via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hi all,
>
> I’m pretty new to the list, and to LLVM in general, so please excuse my
> extreme newbiesness.
>
> I’m trying to figure out what would be the appropriate way to implement
> move semantics.
> I’ve been trying to dump the IR produced by clang with some basic C++
> snippet, but I’m afraid it didn’t help me much.
>

Move semantics in C++ are just a mechanism to use overload resolution to
select a different overload (see
http://en.cppreference.com/w/cpp/language/move_constructor). For example,
if you think about how to map your example C++ code down to equivalent C,
you'll see that in that process you have fully resolved the "move
semantics". At the LLVM level (or the C level), there are no "move
semantics". std::move is basically just a cast that creates a `S&&` which
will then select the right overload.

> Here’s the example I’ve been playing with (in C++):
>
> struct S {
>   S() noexcept: x(new int) {}
>   S(S&& other) {
>     x = other.x
>     other.x = nullptr;
>   }
>   ~S() {
>     delete x;
>   }
> };
>
> S f1() {
>   auto s = S();
>   return s;
> }
>
> S f2() {
>   auto s = S();
>   return std::move(s);
> }
>
> This of course produces a lot of LLVM code (with -O0), but I think I may
> have figured out most of what’s what. In particular, I’ve been able to
> identify the IR code for `f1` and `f2`, but to my surprise, neither of
> those return a value. Both take a pointer to `S` as parameter, which in
> turn gets passed to the constructor of `S`, and return void.
>
> This leaves me with two main questions:
>
>    - First, is the use of a pointer to S as parameter a specificity of
>    clang, or generally the way to go? I’ve seen in the language reference that
>    one could return a struct with a simple ret instruction, so I’m surprised
>    not to see it for the version that doesn’t use move semantics.
>
> This a language ABI question. There's really nothing LLVM-specific about
it. I would recommend thinking about it in terms of how to map C++ to C. As
Davide pointed out, RVO is one reason to choose this particular lowering.
These decisions are made inside Clang (not LLVM) and are mandated by the
ABI (all compilers must implement them the same way for code to be able to
be linked together and work). If you want all the gory details, the itanium
C++ ABI is documented at https://itanium-cxx-abi.github.io/cxx-abi/abi.html
(this is the C++ ABI used on basically all platforms except MSVC; there's a
historical connection to itanium but nothing specific to that processor
about it).

In particular, the description of how to lower return values is
https://itanium-cxx-abi.github.io/cxx-abi/abi.html#return-value

Note that the C++ ABI is phrased in terms of the underlying C ABI (which is
processor-specific (and generally not OS-specific unless you care about
Windows vs non-Windows)), so familiarity with the C ABI is useful too;
documents for the C ABI of different processors can be found in
http://llvm.org/docs/CompilerWriterInfo.html e.g. the "X86 and X86-64 SysV
psABI". The C ABI may seem overwhelming (lots of processor details, corner
cases, etc.), but the basic gist is that there's a list of registers and
each argument is assigned in turn from that list (if the register list is
exhausted, the rest are passed through memory). This is easy as long as
everything (return value and argument types) are int, long, pointer, or
some other primitive type that fits in a register. Struct types are
decomposed into multiple primitive types according to certain rules, until
everything is in terms of primitive types.

e.g. if you understand the examples in https://godbolt.org/g/qSdzHj then
you pretty much have a basic understanding of the C parameter passing ABI.
The register lists for x86-64 are rdi, rsi, ... for arguments and rax, rdx,
... for return values . (Also look at the corresponding LLVM IR).

Very few people actually know the precise rules in detail. However,
basically all LLVM developers (or more generally toolchain developers:
compiler, linker, debugger, etc.) will know the basic rules (like the
examples I linked above) and will know how to look up specifics in the
psABI document as needed (and probably 90% of the time just looking at
Clang's output on an example will answer a particular question; for
example, I forgot that it as rdx as the second register for return values;
all I remembered was that there were two return registers and the first was
rax).

If you're interested, these are the notes I took when I was initially
learning about this:
https://github.com/chisophugis/x64-Forth/blob/master/abi.txt
(woah this takes me back)
That Forth implementation doesn't actually call any external libraries, so
the only external ABI it cares about is the Linux syscall ABI, which I
recorded in this comment for my own memory
https://github.com/chisophugis/x64-Forth/blob/master/compile4.asm#L13
(yes, this was before I learned to use version control...)
The internal "ABI" used by this small Forth implementation is described in
https://github.com/chisophugis/x64-Forth/blob/master/compile4.asm#L6
(forth is a very low-level language so it needs its own processor-specific
ABI; most languages essentially just piggy-back on the C ABI for
processor-specific stuff)

LLVM handles all of this C ABI stuff for you (and there's other aspects of
the C ABI like stack alignment/layout, TLS access, relocations, etc.).
There's quite a bit of essential complexity, but as long as you stick to
the simple cases where the mapping from C is trivial, it's easy to
understand. There's a pretty deep rabbit hole though if you start getting
into complicated cases, but it's all just a relatively simple extension of
the cases that map trivially to C. For the most part, you're only
interaction with the rabbit hole will be needing to debug when things go
wrong, which will require understanding the basic concepts (which can be
understood via simple C examples) and then drill down as needed (e.g.
looking at what clang does, looking at the standard docs, etc.). This is in
some sense simple even for complex cases in that it doesn't require any
complex insight to understand harder cases; you're just verifying
assumptions against a list of rules.

That may sound scary, but as long as the IR your frontend generates is
internally consistent at the LLVM IR level it will be ABI compatible with
itself (modulo bugs in LLVM), so you can basically ignore it. You'll have
to have some familiarity with the C ABI in order to e.g. call an external
function like malloc in libc, but again, as long as the parameter types are
"simple" then the mapping between C and LLVM IR is very simple; you'll need
to have a basic understanding though in order to verify this though and
debug any think-o's.

>
>    - Second, would I use a non-void ret instruction to return the result
>    of an alloca, when would the latter be destroyed? Would that involve a copy
>    from the runtime stack of the callee to that of the caller?
>
> The result of an alloca is a pointer to the stack frame of the current
function, so returning it doesn't make sense (like returning the address of
a local variable in C). Again, this problem isn't really related to LLVM
per se and the easiest way to think about it is in terms of how to lower to
C.
Decisions about object destruction are up to the frontend. At the C or LLVM
level they just turn into function calls or whatever you use to implement
the semantics that the frontend requires. In other words, if you want an
object to be destroyed, it's up to your frontend to emit code to perform
the destruction wherever the destruction is expected to occur.

-- Sean Silva

>
> Thank you very much for your time and your answer,
>
> Best,
>
>
> Dimitri Racordon
> CUI, Université de Genève
> 7, route de Drize, CH-1227 Carouge - Switzerland
> Phone: +41 22 379 01 24 <+41%2022%20379%2001%2024>
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170531/b0f69138/attachment.html>