<div dir="ltr">allocas support alignment, and I believe the backend will sort stack objects by alignment to try to pack data into the padding. If I'm wrong, this is probably a reasonable feature request. That said, 64-byte alignment is larger than the default 16-byte stack alignment available on most platforms, so you'll end up using a more expensive prologue. I'd recommend reducing the alignment back to 16.<div><br></div><div>----</div><div><br></div><div>LLVM will not normally perform tail call optimization if the call takes the address of an alloca. TCO deallocates the frame of the calling function and all of its allocas before jumping to the callee.</div><div><br></div><div>To enable TCO, you would need some new transform to replace uses of a local alloca with uses of the incoming parameter pack. You will need some way to know when the incoming parameter space is big enough for the outgoing call.</div><div><br></div><div>-----</div><div><br></div><div>It sounds like what you really want is something like 'inalloca': <a href="http://llvm.org/docs/InAlloca.html">http://llvm.org/docs/InAlloca.html</a> I strongly advise that you *don't* use it in its current state, though, since we added it for 32-bit MSVC compatibility, it doesn't generate fast code, the mid-level IR is less analyzable, and it's only supported on x86 currently.</div><div><br></div><div>inalloca essentially allows you to manually allocate all of the outgoing argument memory yourself, and its address is passed in implicitly as the incoming stack pointer.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Aug 28, 2015 at 2:43 AM, Nat! via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi<br>
<br>
sorta piggybacking on the other thread. I am looking for some feedback on how to implement the following idea in llvm.<br>
<br>
The really short version of the idea is this:<br>
<br>
* I want to alloca a field (record/struct), so that its size is an even multiple of 64 bytes. [^1]<br>
* This allocaed field will be exclusively used as an argument to functions<br>
* llvm should be aware of the extra bytes and should be able to use them in subsequent arguments to function calls (e.g. tail calls)<br>
<br>
... Why do I need this ? <a href="http://www.mulle-kybernetik.com/weblog/2015/mulle_objc_meta_call_convention.html" rel="noreferrer" target="_blank">http://www.mulle-kybernetik.com/weblog/2015/mulle_objc_meta_call_convention.html</a><br>
<br>
AFAIK AllocaInst can do address alignment but not size alignment. I wonder if that would be an "OK" addition to llvm, if one could specify a rounding also ?<br>
<br>
Then I would need a way to signal to llvm that this is a special field, so that it may reuse all the space. Would I mark that passed in struct with a new __attribute or some such ?<br>
<br>
Finally I would need an optimization pass (?), that would check that the alloca is big enough to hold the values and that the values aren't needed afterwards anymore and then reuse the alloca.<br>
<br>
It would be good to know, by people more well versed in llvm (pretty much anyone on this list :), if this is basically the right approach and how much llvm maybe can already do.<br>
<br>
Ciao<br>
Nat!<br>
<br>
<br>
P.S. Here is some code, that shows what is technically desired:<br>
<br>
# This is a shell archive. Save it in a file, remove anything before<br>
# this line, and then unpack it by entering "sh file". Note, it may<br>
# create directories; files and directories will be owned by you and<br>
# have default permissions.<br>
#<br>
# This archive contains:<br>
#<br>
# Makefile<br>
# types.h<br>
# a_b.c<br>
# a_b_c.c<br>
# main.c<br>
#<br>
echo x - Makefile<br>
sed 's/^X//' >Makefile << 'END-of-Makefile'<br>
XCFLAGS=-g -O3 -DNDEBUG<br>
X<br>
X<br>
Xall: reuse noreuse<br>
X<br>
Xa_b1.o: a_b.c<br>
X $(CC) $(CFLAGS) -c -o $@ -DREUSE=0 $+<br>
X<br>
Xa_b2.o: a_b.c<br>
X $(CC) $(CFLAGS) -c -o $@ -DREUSE=1 $+<br>
X<br>
Xnoreuse: a_b1.o a_b_c.o main.o<br>
X $(CC) -o $@ $(CFLAGS) $+<br>
X<br>
Xreuse: a_b2.o a_b_c.o main.o<br>
X $(CC) -o $@ $(CFLAGS) $+<br>
X<br>
Xclean:<br>
X rm *.o noreuse reuse<br>
X<br>
X<br>
Xreuse.shar: Makefile *.h *.c<br>
X shar $+ > $@<br>
X <br>
END-of-Makefile<br>
echo x - types.h<br>
sed 's/^X//' >types.h << 'END-of-types.h'<br>
Xstruct param_a_b<br>
X{<br>
X int a;<br>
X int b;<br>
X};<br>
X<br>
X<br>
Xstruct param_a_b_c<br>
X{<br>
X struct param_a_b a_b;<br>
X int c;<br>
X};<br>
X<br>
X<br>
Xunion alloc_param_a_b<br>
X{<br>
X struct param_a_b param;<br>
X unsigned char space[ 64 * ((sizeof( struct param_a_b) + 63) / 64)];<br>
X};<br>
X<br>
X<br>
Xunion alloc_param_a_b_c<br>
X{<br>
X struct param_a_b_c param;<br>
X unsigned char space[ 64 * ((sizeof( struct param_a_b_c) + 63) / 64)];<br>
X};<br>
END-of-types.h<br>
echo x - a_b.c<br>
sed 's/^X//' >a_b.c << 'END-of-a_b.c'<br>
X#include "types.h"<br>
X#include <assert.h><br>
X<br>
X<br>
Xextern int g( union alloc_param_a_b_c *space);<br>
X<br>
X<br>
X#if REUSE<br>
X<br>
Xint f( union alloc_param_a_b *space)<br>
X{<br>
X assert( sizeof( union alloc_param_a_b) == sizeof( union alloc_param_a_b_c));<br>
X<br>
X ((union alloc_param_a_b_c *) space)->param.c = 1848;<br>
X return( g( (union alloc_param_a_b_c *) space));<br>
X}<br>
X<br>
X#else<br>
X<br>
Xint f( union alloc_param_a_b *space)<br>
X{<br>
X union alloc_param_a_b_c x;<br>
X<br>
X x.param.a_b.a = space->param.a;<br>
X x.param.a_b.b = space->param.b;<br>
X x.param.c = 1848;<br>
X<br>
X return( g( &x));<br>
X}<br>
X<br>
X#endif<br>
END-of-a_b.c<br>
echo x - a_b_c.c<br>
sed 's/^X//' >a_b_c.c << 'END-of-a_b_c.c'<br>
X#include "types.h"<br>
X<br>
X<br>
Xint g( union alloc_param_a_b_c *p)<br>
X{<br>
X return( p->param.a_b.a + p->param.a_b.b + p->param.c);<br>
X}<br>
END-of-a_b_c.c<br>
echo x - main.c<br>
sed 's/^X//' >main.c << 'END-of-main.c'<br>
X#include "types.h"<br>
X<br>
X<br>
Xint f( union alloc_param_a_b *space);<br>
X<br>
X<br>
Xint main()<br>
X{<br>
X union alloc_param_a_b args;<br>
X<br>
X args.param.a = 18;<br>
X args.param.b = 48;<br>
X return( f( &args));<br>
X}<br>
END-of-main.c<br>
exit<br>
<br>
The potential gains are obvious:<br>
<br>
otool -t -v a_b1.o<br>
a_b1.o:<br>
(__TEXT,__text) section<br>
_f:<br>
0000000000000000 pushq %rbp<br>
0000000000000001 movq %rsp, %rbp<br>
0000000000000004 pushq %rbx<br>
0000000000000005 subq $0x48, %rsp<br>
0000000000000009 movq (%rip), %rbx<br>
0000000000000010 movq (%rbx), %rbx<br>
0000000000000013 movq %rbx, -0x10(%rbp)<br>
0000000000000017 movl (%rdi), %eax<br>
0000000000000019 movl %eax, -0x50(%rbp)<br>
000000000000001c movl 0x4(%rdi), %eax<br>
000000000000001f movl %eax, -0x4c(%rbp)<br>
0000000000000022 movl $0x738, -0x48(%rbp) ## imm = 0x738<br>
0000000000000029 leaq -0x50(%rbp), %rdi<br>
000000000000002d callq 0x32<br>
0000000000000032 cmpq -0x10(%rbp), %rbx<br>
0000000000000036 jne 0x3f<br>
0000000000000038 addq $0x48, %rsp<br>
000000000000003c popq %rbx<br>
000000000000003d popq %rbp<br>
000000000000003e retq<br>
000000000000003f callq 0x44<br>
<br>
otool -t -v a_b2.o<br>
a_b2.o:<br>
(__TEXT,__text) section<br>
_f:<br>
0000000000000000 pushq %rbp<br>
0000000000000001 movq %rsp, %rbp<br>
0000000000000004 movl $0x738, 0x8(%rdi) ## imm = 0x738<br>
000000000000000b popq %rbp<br>
000000000000000c jmp 0x11<br>
<br>
<br>
[^1]<br>
<br>
A workaround from clang would be to wrap the struct into a union. (See example code)<br>
<br>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
</blockquote></div><br></div>