[cfe-dev] [cfe-users] Constexpr prevents optimization?
Steffen Hirschmann via cfe-dev
cfe-dev at lists.llvm.org
Wed May 16 00:38:19 PDT 2018
Dear all,
a while ago I posted odd (in the sense that I cannot explain it)
constexpr behavior to the cfe-user mailing list and never received a
reply. Since my observations are still valid for clang-6.0.0, I am
reposting my original message to cfe-dev.
tl;dr: It seems that the use of constexpr in this stupid example I ran
across back in February prohibits a certain type of optimization that
clang does. I cannot think of a reason for this behavior, therefore, I
ask you.
Greetings,
Steffen
P.S.: This also happens if one defines "fib" correctly (i <= 1). :)
On 10:51 Fri 16 Feb , Steffen Hirschmann via cfe-users wrote:
> Dear all,
>
> I was just playing around with a toy example when I noticed an oddity in
> the code generated by clang-5.0.0 (and also in clang-5.0.1) regarding
> constexpr.
>
> Given the code:
> > int fib(int i) { if (i <= 0) return i; else return (fib(i - 1) + fib(i - 2)) % 100; }
> > int main()
> > {
> > int ret = 0;
> > for (int i = 0; i < 10; ++i)
> > ret += fib(39);
> > return ret;
> > }
>
> Compile it with clang++ -O3 and what you get is (gdb disassembly of "main"):
> > 7 {
> > 8 int ret = 0;
> > 9 for (int i = 0; i < 10; ++i)
> > 10 ret += fib(39);
> > 0x00000000004004e0 <+0>: push rax
> > 0x00000000004004e1 <+1>: mov edi,0x27
> > 0x00000000004004e6 <+6>: call 0x400490 <fib(int)>
> >
> > 9 for (int i = 0; i < 10; ++i)
> > 0x00000000004004eb <+11>: add eax,eax
> > 0x00000000004004ed <+13>: lea eax,[rax+rax*4]
> >
> > 11 return ret;
> > 0x00000000004004f0 <+16>: pop rcx
> > 0x00000000004004f1 <+17>: ret
>
> A call to fib(39) once followed by a multiplication with 10.
>
> Now, if you make "fib" constexpr, i.e.:
> > constexpr int fib(int i) { if (i <= 0) return i; else return (fib(i - 1) + fib(i - 2)) % 100; }
>
> And, again, compile it with -O3 and disassemble "main":
> > 7 {
> > 8 int ret = 0;
> > 9 for (int i = 0; i < 10; ++i)
> > 10 ret += fib(39);
> > 0x0000000000400490 <+0>: push rbp
> > 0x0000000000400491 <+1>: push rbx
> > 0x0000000000400492 <+2>: push rax
> > 0x0000000000400493 <+3>: mov edi,0x27
> > 0x0000000000400498 <+8>: call 0x400530 <fib(int)>
> > 0x000000000040049d <+13>: mov ebx,eax
> > 0x000000000040049f <+15>: mov edi,0x27
> > 0x00000000004004a4 <+20>: call 0x400530 <fib(int)>
> > 0x00000000004004a9 <+25>: mov ebp,eax
> > 0x00000000004004ab <+27>: add ebp,ebx
> > 0x00000000004004ad <+29>: mov edi,0x27
> > 0x00000000004004b2 <+34>: call 0x400530 <fib(int)>
> > 0x00000000004004b7 <+39>: mov ebx,eax
> > 0x00000000004004b9 <+41>: add ebx,ebp
> > 0x00000000004004bb <+43>: mov edi,0x27
> > 0x00000000004004c0 <+48>: call 0x400530 <fib(int)>
> > 0x00000000004004c5 <+53>: mov ebp,eax
> > 0x00000000004004c7 <+55>: add ebp,ebx
> > 0x00000000004004c9 <+57>: mov edi,0x27
> > 0x00000000004004ce <+62>: call 0x400530 <fib(int)>
> > 0x00000000004004d3 <+67>: mov ebx,eax
> > 0x00000000004004d5 <+69>: add ebx,ebp
> > 0x00000000004004d7 <+71>: mov edi,0x27
> > 0x00000000004004dc <+76>: call 0x400530 <fib(int)>
> > 0x00000000004004e1 <+81>: mov ebp,eax
> > 0x00000000004004e3 <+83>: add ebp,ebx
> > 0x00000000004004e5 <+85>: mov edi,0x27
> > 0x00000000004004ea <+90>: call 0x400530 <fib(int)>
> > 0x00000000004004ef <+95>: mov ebx,eax
> > 0x00000000004004f1 <+97>: add ebx,ebp
> > 0x00000000004004f3 <+99>: mov edi,0x27
> > 0x00000000004004f8 <+104>: call 0x400530 <fib(int)>
> > 0x00000000004004fd <+109>: mov ebp,eax
> > 0x00000000004004ff <+111>: add ebp,ebx
> > 0x0000000000400501 <+113>: mov edi,0x27
> > 0x0000000000400506 <+118>: call 0x400530 <fib(int)>
> > 0x000000000040050b <+123>: mov ebx,eax
> > 0x000000000040050d <+125>: add ebx,ebp
> > 0x000000000040050f <+127>: mov edi,0x27
> > 0x0000000000400514 <+132>: call 0x400530 <fib(int)>
> > 0x0000000000400519 <+137>: add eax,ebx
> >
> > 11 return ret;
> > 0x000000000040051b <+139>: add rsp,0x8
> > 0x000000000040051f <+143>: pop rbx
> > 0x0000000000400520 <+144>: pop rbp
> > 0x0000000000400521 <+145>: ret
>
> That's 10 calls to function "fib" (for which the assembly is essentially
> the same as in the example above).
>
> Regardless of whether the function is evaluated at compile time or not,
> it seems odd to me that using constexpr here prohibits clang from
> emitting the very same code as in the non-constexpr example. Note
> however, that if you declare "fib" to be "static constexpr" clang,
> again, emits the multiplication code.
>
> Is there something keeping clang from producing the multiplication code
> for a non-static constexpr example that I don't see? And why is the
> optimization possible again if one makes "fib" static?
>
> Greetings,
> Steffen
>
More information about the cfe-dev
mailing list