[all-commits] [llvm/llvm-project] aff98e: [ARM] Stop gluing 1-bit shifts (#116547)

Tue Nov 19 06:47:10 PST 2024

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: aff98e4be05a1060e489ce62a88ee0ff365e571a
      https://github.com/llvm/llvm-project/commit/aff98e4be05a1060e489ce62a88ee0ff365e571a
  Author: Sergei Barannikov <barannikov88 at gmail.com>
  Date:   2024-11-19 (Tue, 19 Nov 2024)

  Changed paths:
    M llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp
    M llvm/lib/Target/ARM/ARMISelLowering.cpp
    M llvm/lib/Target/ARM/ARMISelLowering.h
    M llvm/lib/Target/ARM/ARMInstrInfo.td
    M llvm/lib/Target/ARM/ARMInstrThumb2.td
    M llvm/lib/Target/ARM/ARMScheduleM7.td
    M llvm/lib/Target/ARM/ARMScheduleM85.td
    M llvm/test/CodeGen/ARM/urem-seteq-illegal-types.ll

  Log Message:
  -----------
  [ARM] Stop gluing 1-bit shifts (#116547)

1. When two (or more) nodes are glued, DAG scheduler will always
schedule them as one piece, i.e. it will not allow any instructions to
be scheduled between them. It does so because if nodes are glued this
usually means that there is an implicit register dependency between
them, and an intervening node could clobber this physical register. When
emitting such nodes into machine IR, they will also be stuck together,
e.g.:
```
    %9:gpr = MOVsrl_glue killed %8, implicit-def $cpsr
    %10:gpr = RRX %3, implicit $cpsr
```

2. If a node has Glue result, SelectionDAG will not try to CSE this
node. If it did, it would break the implicit physical register
dependency. In practice this means that if a node with Glue result has
multiple uses, it has to be duplicated before each use. This the reason
for `ARMTargetLowering::duplicateCmp` to exist.

When using normal data dependency, dependent nodes can freely be
scheduled around. If there is a physical register dependency between
nodes, the physical register will be copied to/from a virtual register,
allowing other nodes to intervene between them. The resulting machine IR
might look like this:
```
    %9:gpr = LSRs1 killed %8, implicit-def $cpsr
    %10:gpr = COPY $cpsr
    %11:gpr = ORRrsi killed %9, %3, 242, 14 /* CC::al */, $noreg, $noreg
    %12:gpr = BICri killed %11, -2147483648, 14 /* CC::al */, $noreg, $noreg
    $cpsr = COPY %10
    %13:gpr = RRX %3, implicit $cpsr
```

The two copies are likely to be eliminated by register coalescer, given
that there are no instructions between them that clobber this physical
register. If the copies are unwanted in the first place (they could be
expensive or impossible), DAG scheduler will try to avoid inserting them
wherever possible, and the resulting machine IR will look like this:
```
    %9:gpr = LSRs1 killed %8, implicit-def $cpsr
    %10:gpr = ORRrsi killed %9, %3, 242, 14 /* CC::al */, $noreg, $noreg
    %11:gpr = BICri killed %10, -2147483648, 14 /* CC::al */, $noreg, $noreg
    %12:gpr = RRX %3, implicit $cpsr
```

On ARM, arithmetic operations and LSLS already use the new data flow
approach. This patch extends it to include 1-bit shifts.

Pull Request: https://github.com/llvm/llvm-project/pull/116547

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications