[cfe-dev] clang generates way more code than - Optimizer bug?
Sjoerd Meijer via cfe-dev
cfe-dev at lists.llvm.org
Mon Dec 6 00:57:43 PST 2021
Thanks for checking. :)
What matters most is where in the app the run time is spent (i.e. which code paths are hot) and that must be the vectorised loop which processes 16 elements in parallel using SIMD instructions. If we compare GCC O2 with Clang O3, then we compare an efficient scalar loop with a vectorised loop and see a ~5x improvement. That's pretty decent, but is some way off from a theoretical 16x speed up and there could be many reasons for that. First, not all time is spent in the kernel, and some time is spent in all the setup code before it enters the loop, or the vectorised codegen is not efficient enough, or it is waiting for data. I haven't looked into details (the code and experimental setup) so this is a bit hand waivy, but I guess this must be the gist of it.
Cheers.
________________________________
From: Dennis Luehring <dl.soluz at gmx.net>
Sent: 04 December 2021 06:27
To: Sjoerd Meijer <Sjoerd.Meijer at arm.com>; cfe-dev <cfe-dev at lists.llvm.org>
Subject: Re: [cfe-dev] clang generates way more code than - Optimizer bug?
Am 03.12.2021 um 14:58 schrieb Sjoerd Meijer:
> I guess the only way to tell is to run and measure it.
yeah, the clang code is faster than gcc - even with the much larger code
- as you told :)
thank you
gcc O1: 9.782262 seconds
gcc O2: 8.115871 seconds
gcc O3: 3.092142 seconds
clang O1: 9.905967 seconds
clang O2: 1.629295 seconds
clang O3: 1.629502 seconds
benchmark-code:
#include <stdio.h>
#include <time.h>
void decipher(unsigned char* text_, int text_len_)
{
for (int i = text_len_ - 1; i >= 0; --i)
{
text_[i] ^= 0xff - (i << 2);
}
}
int benchmark()
{
unsigned char text[] = { 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77,
0x88, 0x99, 0xAA, 0xBB, 0xCC, 0xDD, 0xEE, 0xFF };
int anti_optimizer = 0;
for (int i = 0; i < 1000000000; ++i)
{
decipher(text, sizeof(text));
for (int x = 0; x < sizeof(text); ++x)
{
anti_optimizer += text[x];
}
}
return anti_optimizer;
}
int main()
{
clock_t start_time = clock();
int result = benchmark();
double elapsed_time = (double)(clock() - start_time) / CLOCKS_PER_SEC;
printf("Done in %f seconds\n", elapsed_time);
return result;
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20211206/30336640/attachment.html>
More information about the cfe-dev
mailing list