[llvm-bugs] [Bug 44380] New: Changing place of "goto" makes code slower

Wed Dec 25 14:09:32 PST 2019

https://bugs.llvm.org/show_bug.cgi?id=44380

            Bug ID: 44380
           Summary: Changing place of "goto" makes code slower
           Product: clang
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: C
          Assignee: unassignedclangbugs at nondot.org
          Reporter: safinaskar at mail.ru
                CC: blitzrakete at gmail.com, dgregor at apple.com,
                    erik.pilkington at gmail.com, llvm-bugs at lists.llvm.org,
                    richard-llvm at metafoo.co.uk

Consider this two sources: https://godbolt.org/z/j6uSL6 (STILLFAST) and
https://godbolt.org/z/gcvDJV (BECAMESLOW) (all tests performed on clang 10,
even if godbolt pages say something else). The only difference is placement of
"goto" statement. This should not change meaning of the code. But one code
gives vectorized code, and other is not. Change in speed is dramatic in my
tests. So, changing placement of "goto" prevents optimizations. This is a bug,
and so I reported it.

Now let me say what that code means.

Consider this source (I will codename it as STDGETC)

---
int c;
int result = 0;
while ((c = getc_unlocked(stdin)) != EOF)
  if (c == '\n')
    ++result;
---

This is, of course, simple implementation of "wc -l". And it is slow in my
tests.

Let's rewrite it so: https://godbolt.org/z/HtY3Ym (I name this RAWREAD)

Now the code is very fast, but it is too big.

So I've got an idea: to write fast stdio implementation. Such implementation
should combine beautiful "getc" API with speed of raw implementation.

Here is first attempt to write such implementation:
https://godbolt.org/z/BJJPYv (I name this MYIO). Of course, it is not full
implementation yet. And it performs slower than RAWREAD.

So, I started to investigate why it performs slower. So I started to make
incremental changes in RAWREAD until I get to MYIO. To catch that moment when
code starts to be slow.

So here is chain of my changes to RAWREAD:

https://godbolt.org/z/QVC7Cz
https://godbolt.org/z/HSP3hV
https://godbolt.org/z/3BzQTi
https://godbolt.org/z/s8HRr_
https://godbolt.org/z/yUptWS
https://godbolt.org/z/P35x3N
https://godbolt.org/z/g5JD-D
https://godbolt.org/z/YPVh3j
https://godbolt.org/z/j6uSL6 (at this point assembly changes, but code is still
vectorized and fast, I will call this code STILLFAST)
https://godbolt.org/z/gcvDJV (I will call this BECAMESLOW)

So, the last two sources are essentially same, but the second is slower. And
this is a bug.

Debian stretch (with some packages from Debian buster), x86_64, Linux 4.19

"clang-10 -v" output:

----
clang version 10.0.0-+20191211115110+02168549172-1~exp1~20191211105657.1646 
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Found candidate GCC installation: /usr/bin/../lib/gcc/i686-linux-gnu/6
Found candidate GCC installation: /usr/bin/../lib/gcc/i686-linux-gnu/6.3.0
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/6
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0
Found candidate GCC installation: /usr/lib/gcc/i686-linux-gnu/6
Found candidate GCC installation: /usr/lib/gcc/i686-linux-gnu/6.3.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/6
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/6.3.0
Selected GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0
Candidate multilib: .;@m64
Selected multilib: .;@m64
----

This is clang installed from https://apt.llvm.org

Intel Core i7, hyper-threading is disabled at BIOS level, /proc/cpuinfo reports
4 cores

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20191225/53f3cdf9/attachment.html>