[llvm-bugs] [Bug 44380] New: Changing place of "goto" makes code slower

via llvm-bugs llvm-bugs at lists.llvm.org
Wed Dec 25 14:09:32 PST 2019


            Bug ID: 44380
           Summary: Changing place of "goto" makes code slower
           Product: clang
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: C
          Assignee: unassignedclangbugs at nondot.org
          Reporter: safinaskar at mail.ru
                CC: blitzrakete at gmail.com, dgregor at apple.com,
                    erik.pilkington at gmail.com, llvm-bugs at lists.llvm.org,
                    richard-llvm at metafoo.co.uk

Consider this two sources: https://godbolt.org/z/j6uSL6 (STILLFAST) and
https://godbolt.org/z/gcvDJV (BECAMESLOW) (all tests performed on clang 10,
even if godbolt pages say something else). The only difference is placement of
"goto" statement. This should not change meaning of the code. But one code
gives vectorized code, and other is not. Change in speed is dramatic in my
tests. So, changing placement of "goto" prevents optimizations. This is a bug,
and so I reported it.

Now let me say what that code means.

Consider this source (I will codename it as STDGETC)

int c;
int result = 0;
while ((c = getc_unlocked(stdin)) != EOF)
  if (c == '\n')

This is, of course, simple implementation of "wc -l". And it is slow in my

Let's rewrite it so: https://godbolt.org/z/HtY3Ym (I name this RAWREAD)

Now the code is very fast, but it is too big.

So I've got an idea: to write fast stdio implementation. Such implementation
should combine beautiful "getc" API with speed of raw implementation.

Here is first attempt to write such implementation:
https://godbolt.org/z/BJJPYv (I name this MYIO). Of course, it is not full
implementation yet. And it performs slower than RAWREAD.

So, I started to investigate why it performs slower. So I started to make
incremental changes in RAWREAD until I get to MYIO. To catch that moment when
code starts to be slow.

So here is chain of my changes to RAWREAD:

https://godbolt.org/z/j6uSL6 (at this point assembly changes, but code is still
vectorized and fast, I will call this code STILLFAST)
https://godbolt.org/z/gcvDJV (I will call this BECAMESLOW)

So, the last two sources are essentially same, but the second is slower. And
this is a bug.

Debian stretch (with some packages from Debian buster), x86_64, Linux 4.19

"clang-10 -v" output:

clang version 10.0.0-+20191211115110+02168549172-1~exp1~20191211105657.1646 
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Found candidate GCC installation: /usr/bin/../lib/gcc/i686-linux-gnu/6
Found candidate GCC installation: /usr/bin/../lib/gcc/i686-linux-gnu/6.3.0
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/6
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0
Found candidate GCC installation: /usr/lib/gcc/i686-linux-gnu/6
Found candidate GCC installation: /usr/lib/gcc/i686-linux-gnu/6.3.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/6
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/6.3.0
Selected GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0
Candidate multilib: .;@m64
Selected multilib: .;@m64

This is clang installed from https://apt.llvm.org

Intel Core i7, hyper-threading is disabled at BIOS level, /proc/cpuinfo reports
4 cores

You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20191225/53f3cdf9/attachment.html>

More information about the llvm-bugs mailing list