[llvm-bugs] [Bug 38813] New: [MC][llvm-mca] Teach how to identify instructions that a false dependency on the destination register.

Mon Sep 3 06:30:30 PDT 2018

https://bugs.llvm.org/show_bug.cgi?id=38813

            Bug ID: 38813
           Summary: [MC][llvm-mca] Teach how to identify instructions that
                    a false dependency on the destination register.
           Product: new-bugs
           Version: unspecified
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: new bugs
          Assignee: unassignedbugs at nondot.org
          Reporter: andrea.dibiagio at gmail.com
                CC: llvm-bugs at lists.llvm.org

Some instructions have a false dependency on the destination register.

For example, on some Intel CPUs, there is a false dependency on the
LZCNT/TZCNT/POPCNT destination register. See also bug 33869.

That false dependency can be broken using a dep-breaking zero idiom.

On BtVer2, there is a similar issue with general purpose zero/sign extending
MOV instructions.
For example:

   movzbl %al, %esi
   movzbl %al, %esi
   movzbl %al, %esi
   movzbl %al, %esi

`perf stat` reports a throughput of 1.00 instructions per cycle, even if,
ideally, a movz could be issued to one of two pipelines, and the cpu can
dispatch two COPs per cycle.

Same for movzwl:

   movzwl %al, %esi
   movzwl %al, %esi
   movzwl %al, %esi
   movzwl %al, %esi

`perf stat` still reports 1.00 IPC.

If we instead test this:

   movzwl %al, %esi
   movzwl %al, %ecx
   movzwl %al, %edx
   movzwl %al, %ebx

Then the throughput is 2.00 IPC (as expected).

Same issue can be found with sign-extending GPR moves.

--

In the X86 backend, we currently use special feature flags to mark Intel
processors that have a false dependency on LZCNT/TZCNT/POPCNT. That information
is then used to bias the result of X86InstrInfo::hasPartialRegUpdate() queries
(https://reviews.llvm.org/D40334).

The goal of this bug is to teach llvm-mca about the existence of instructions
that have a false dependency on their output register. Ideally, we would like
to have a general framework for doing queries on the subtarget.

Rather than having a target feature flag for every problematic instruction, we
should expose knowledge about instructions that are subject to that extra false
dependency using target independent hooks (which are then redefined in override
by each target that want to change their semantic).

Ideally, knowledge about instructions with a false dependency could be
automatically generated via tablegen (similarly to how for example we generate
information for variant scheduling classes in the scheduling models). So that
each subtarget/processor model may specify the set of "problematic"
instructions.

Tablegen backends would then generate useful TII/STI hook overrides for us.
This would make that knowledge accessible through a target independent
interface from any codegen pass in the backend.

This would not only help llvm-mca, but it would probably help simplifying some
code (at least in the X86 backend) which currently heavily relies on the
presecne of target feature flags, and target specific functions.

This same framework could be used to expose partial write stalls caused by
false dependencies on the output register in the presence of SSE1/SSE2
sqrt/rsqrt/rcp instructions. See also the discussion on
https://reviews.llvm.org/D51542.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20180903/4dff4d47/attachment.html>