[llvm] [MCA] New option to report scheduling information: -scheduling-info (PR #126703)

Tue Feb 11 01:43:09 PST 2025

llvmbot wrote:




@llvm/pr-subscribers-llvm-binary-utilities

Author: Julien Villette (jvillette38)

<details>
<summary>Changes</summary>

This is a new way to update scheduling information in llvm. I have used this to update scheduling information for AArch64 Neoverse V1 micro architecture (new patches will follow and will be dependent to this pull request).

This pull request contains 2 commits:
A) `llvm-mca -scheduling-info` option
B) `update_mca_test_checks.py` new options: `--check-sched-info` and `--update-sched-info`.

A) `llvm-mca -scheduling-info` disables default llvm-mca reporting (InstructionInfoView) and output information in the following format:
`
<uOps> | <Latency>  | <Bypass Latency> | <Throughput> | <Resources> | <LLVM Opcode>  | <Assembly input: instruction + comment>
`
Example from new llvm-mca test `AArch64/Neoverse/V1-scheduling-info.s`:
Input:
`
  abs  v25.2s, v25.2s  // ABS <Vd>.<T>, <Vn>.<T>  \\ ASIMD arith, basic  \\ 1 2  2  4.0 V1UnitV
`
Output:
`
1 | 2 | 2 | 4.00 | V1UnitSVE01, V1UnitV | ABSv2i32 | abs  v25.2s, v25.2s  // ABS <Vd>.<T>, <Vn>.<T>  \\ ASIMD arith, basic  \\ 1 2  2  4.0 V1UnitV
`

So if we are able to extract scheduling information from micro architecture document for each instruction variant, it is possible to write test in this form and check `llvm-mca -scheduling-info` output for the differences between llvm information compared to the one in comments. If you get differences, check the documentation to update comment or fix llvm to update llvm-mca output.
LLVM Opcode is given to make easier the changes in target description.

B) `update_mca_test_checks.py  --check-sched-info` is used to check informations between `llvm-mca` output and information in comments. If found differences, it will exit with error code and report them. Developer can fix comments or llvm target description or use `update_mca_test_checks.py --update-sched-info` to update automatically comments and then check differences with git.

Convention for comments used by new update_mca_test_checks.py options:
- C or C++ style comment: '/* */' and '//'
- Fields:
`
<asm instruction> <// or /*> <instruction format> \\ <micro architecture reference> \\ <uOps> <Latency> <Bypass latency> <Throughput> <Resources seperated with commas>
`

@mshockwave and @Rin18 may be interested.

---

Patch is 1.49 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/126703.diff


10 Files Affected:

- (modified) llvm/docs/CommandGuide/llvm-mca.rst (+14) 
- (modified) llvm/include/llvm/MC/MCSchedule.h (+4) 
- (modified) llvm/lib/MC/MCSchedule.cpp (+37) 
- (added) llvm/test/tools/llvm-mca/AArch64/Neoverse/V1-scheduling-info.s (+7588) 
- (modified) llvm/tools/llvm-mca/CMakeLists.txt (+1) 
- (modified) llvm/tools/llvm-mca/Views/InstructionInfoView.h (+1) 
- (added) llvm/tools/llvm-mca/Views/SchedulingInfoView.cpp (+210) 
- (added) llvm/tools/llvm-mca/Views/SchedulingInfoView.h (+96) 
- (modified) llvm/tools/llvm-mca/llvm-mca.cpp (+31-11) 
- (modified) llvm/utils/update_mca_test_checks.py (+168) 


``````````diff

diff --git a/llvm/docs/CommandGuide/llvm-mca.rst b/llvm/docs/CommandGuide/llvm-mca.rst
index f610ea2f2168269..1c5275ce000b111 100644
--- a/llvm/docs/CommandGuide/llvm-mca.rst
+++ b/llvm/docs/CommandGuide/llvm-mca.rst
@@ -170,6 +170,20 @@ option specifies "``-``", then the output will also be sent to standard output.
   Enable extra scheduler statistics. This view collects and analyzes instruction
   issue events. This view is disabled by default.
 
+.. option:: -scheduling-info
+
+  Enable scheduling info view. This view reports scheduling information defined
+  in LLVM target description in the form:
+  uOps | Latency | Bypass Latency | Throughput | LLVM OpcodeName | Resources
+  units | assembly instruction and its comment (// or /* */) if defined.
+  It allows to compare scheduling info with architecture documents and fix them
+  in target description by fixing InstrRW for the reported LLVM opcode.
+  Scheduling information can be defined in the same order in each instruction
+  comments to check easily reported and reference scheduling information.
+  Suggested information in comment:
+  // <architecture instruction form> \\ <scheduling documentation title> \\
+     <uOps>, <Latency>, <Bypass Latency>, <Throughput>, <Resources units>
+
 .. option:: -retire-stats
 
   Enable extra retire control unit statistics. This view is disabled by default.
diff --git a/llvm/include/llvm/MC/MCSchedule.h b/llvm/include/llvm/MC/MCSchedule.h
index fe731d086f70ae3..dcbc5369120a39b 100644
--- a/llvm/include/llvm/MC/MCSchedule.h
+++ b/llvm/include/llvm/MC/MCSchedule.h
@@ -402,6 +402,10 @@ struct MCSchedModel {
   static unsigned getForwardingDelayCycles(ArrayRef<MCReadAdvanceEntry> Entries,
                                            unsigned WriteResourceIdx = 0);
 
+  /// Returns the maximum forwarding delay for maximum write latency.
+  static unsigned getForwardingDelayCycles(const MCSubtargetInfo &STI,
+                                       const MCSchedClassDesc &SCDesc);
+
   /// Returns the default initialized model.
   static const MCSchedModel Default;
 };
diff --git a/llvm/lib/MC/MCSchedule.cpp b/llvm/lib/MC/MCSchedule.cpp
index ed243cecabb7638..4ef6acf78714fa7 100644
--- a/llvm/lib/MC/MCSchedule.cpp
+++ b/llvm/lib/MC/MCSchedule.cpp
@@ -174,3 +174,40 @@ MCSchedModel::getForwardingDelayCycles(ArrayRef<MCReadAdvanceEntry> Entries,
 
   return std::abs(DelayCycles);
 }
+
+unsigned
+MCSchedModel::getForwardingDelayCycles(const MCSubtargetInfo &STI,
+                                            const MCSchedClassDesc &SCDesc) {
+
+  ArrayRef<MCReadAdvanceEntry> Entries = STI.getReadAdvanceEntries(SCDesc);
+  if (Entries.empty())
+    return 0;
+
+  unsigned Latency = 0;
+  unsigned maxLatency = 0;
+  unsigned WriteResourceID = 0;
+  unsigned DefEnd = SCDesc.NumWriteLatencyEntries;
+
+  for (unsigned DefIdx = 0; DefIdx != DefEnd; ++DefIdx) {
+    // Lookup the definition's write latency in SubtargetInfo.
+    const MCWriteLatencyEntry *WLEntry =
+        STI.getWriteLatencyEntry(&SCDesc, DefIdx);
+    // Early exit if we found an invalid latency.
+    // Consider no bypass
+    if (WLEntry->Cycles < 0)
+      return 0;
+    maxLatency = std::max(Latency, static_cast<unsigned>(WLEntry->Cycles));
+    if (maxLatency > Latency) {
+      WriteResourceID = WLEntry->WriteResourceID;
+    }
+    Latency = maxLatency;
+  }
+
+  for (const MCReadAdvanceEntry &E : Entries) {
+    if (E.WriteResourceID == WriteResourceID) {
+      return E.Cycles;
+    }
+  }
+
+  llvm_unreachable("WriteResourceID not found in MCReadAdvanceEntry entries");
+}
diff --git a/llvm/test/tools/llvm-mca/AArch64/Neoverse/V1-scheduling-info.s b/llvm/test/tools/llvm-mca/AArch64/Neoverse/V1-scheduling-info.s
new file mode 100644
index 000000000000000..c421166f22ea45e
--- /dev/null
+++ b/llvm/test/tools/llvm-mca/AArch64/Neoverse/V1-scheduling-info.s
@@ -0,0 +1,7588 @@
+# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
+# RUN: llvm-mca -mtriple=aarch64 -mcpu=neoverse-v1 -scheduling-info < %s | FileCheck %s
+
+  .text
+  .file	        "V1-scheduling-info.s"
+  .globl	test
+  .p2align	4
+  .type	test, at function
+test:
+  .cfi_startproc
+  abs D15, D11  /* ABS <V><d>, <V><n>  \\ ASIMD arith, basic  \\ 1 2  2  4.0 V1UnitV */
+  abs V25.2S, V25.2S  // ABS <Vd>.<T>, <Vn>.<T>  \\ ASIMD arith, basic  \\ 1 2  2  4.0 V1UnitV
+  abs Z26.B, P6/M, Z27.B  // ABS <Zd>.<T>, <Pg>/M, <Zn>.<T>  \\ Arithmetic, basic  \\ 1 2  2  2.0 V1UnitV01
+  adc W13, W6, W4  // ADC <Wd>, <Wn>, <Wm>  \\ ALU, basic  \\ 1 1  1  4.0 V1UnitI
+  adc X8, X12, X10  // ADC <Xd>, <Xn>, <Xm>  \\ ALU, basic  \\ 1 1  1  4.0 V1UnitI
+  adcs W29, W7, W30  // ADCS <Wd>, <Wn>, <Wm>  \\ ALU, basic, flagset  \\ 1 1  1  3.00 V1UnitI,V1UnitFlg
+  adcs X11, X3, X5  // ADCS <Xd>, <Xn>, <Xm>  \\ ALU, basic, flagset  \\ 1 1  1  3.00 V1UnitI,V1UnitFlg
+  add WSP, WSP, W10  // ADD <Wd|WSP>, <Wn|WSP>, <Wm>  \\ ALU, basic, unconditional, no flagset  \\ 1 2  2  2.00 V1UnitI
+  add WSP, WSP, W2, UXTB   // ADD <Wd|WSP>, <Wn|WSP>, <Wm>, <wextend>   \\ ALU, basic, unconditional, no flagset  \\ 1 2  2  2.00 V1UnitI
+  add WSP, WSP, W13, UXTH #4  // ADD <Wd|WSP>, <Wn|WSP>, <Wm>, <wextend> #<amount>  \\ ALU, basic, unconditional, no flagset  \\ 1 2  2  2.00 V1UnitI
+  add WSP, WSP, W13, LSL #4  // ADD <Wd|WSP>, <Wn|WSP>, <Wm>, LSL #<amount>  \\ Arithmetic, LSL shift, shift <= 4  \\ 1 2  2  2.00 V1UnitI
+  add X22, X2, X27  // ADD <Xd|SP>, <Xn|SP>, X<m>  \\ ALU, basic  \\ 1 1  1  4.0 V1UnitI
+  add X25, X9, W25, UXTB  // ADD <Xd|SP>, <Xn|SP>, <R><m>, <extend>  \\ ALU, basic  \\ 1 2  2  2.00 V1UnitI
+  add X4, X28, W3, UXTB #3  // ADD <Xd|SP>, <Xn|SP>, <R><m>, <extend> #<amount>  \\ ALU, extend and shift  \\ 1 2  2  2.0 V1UnitM
+  add X0, X28, X26, LSL #3  // ADD <Xd|SP>, <Xn|SP>, X<m>, LSL #<amount>  \\ Arithmetic, LSL shift, shift <= 4  \\ 1 1  1  4.0 V1UnitI
+  add WSP, WSP, #3765  // ADD <Wd|WSP>, <Wn|WSP>, #<imm>  \\ ALU, basic  \\ 1 1  1  4.0 V1UnitI
+  add WSP, WSP, #3547, LSL #12  // ADD <Wd|WSP>, <Wn|WSP>, #<imm>, <shift>  \\ ALU, basic  \\ 1 1  1  4.0 V1UnitI
+  add X7, X30, #803  // ADD <Xd|SP>, <Xn|SP>, #<imm>  \\ ALU, basic  \\ 1 1  1  4.0 V1UnitI
+  add X7, X2, #319, LSL #12  // ADD <Xd|SP>, <Xn|SP>, #<imm>, <shift>  \\ ALU, basic  \\ 1 1  1  4.0 V1UnitI
+  add Z13.D, Z13.D, #245  // ADD <Zdn>.<T>, <Zdn>.<T>, #<imm>  \\ Arithmetic, basic  \\ 1 2  2  2.0 V1UnitV01
+  add Z16.D, Z16.D, #233, LSL #8  // ADD <Zdn>.<T>, <Zdn>.<T>, #<imm>, <shift>  \\ Arithmetic, basic  \\ 1 2  2  2.0 V1UnitV01
+  add W3, W2, W21, LSL #3  // ADD <Wd>, <Wn>, <Wm>, LSL #<wamountl>  \\ Arithmetic, LSL shift by immed, shift <= 4, unconditional, no flagset   \\ 1 1  1  4.0 V1UnitI
+  add W6, W21, W17, LSL #15  // ADD <Wd>, <Wn>, <Wm>, LSL #<wamounth>  \\ Arithmetic, LSR/ASR/ROR shift by immed or LSL shift by immed > 4, unconditional  \\ 1 2  2  2.0 V1UnitM
+  add W28, W30, W19, ASR #30  // ADD <Wd>, <Wn>, <Wm>, <shift> #<wamount>  \\ Arithmetic, LSR/ASR/ROR shift by immed or LSL shift by immed > 4, unconditional  \\ 1 2  2  2.0 V1UnitM
+  add X8, X3, X28, LSL #3  // ADD <Xd>, <Xn>, <Xm>, LSL #<amountl>  \\ Arithmetic, LSL shift, shift <= 4  \\ 1 1  1  4.0 V1UnitI
+  add X12, X13, X0, LSL #44  // ADD <Xd>, <Xn>, <Xm>, LSL #<amounth>  \\ Arithmetic, LSR/ASR/ROR shift or LSL shift > 4  \\ 1 2  2  2.0 V1UnitM
+  add X5, X20, X28, LSR #16  // ADD <Xd>, <Xn>, <Xm>, <shift> #<amount>  \\ Arithmetic, LSR/ASR/ROR shift or LSL shift > 4  \\ 1 2  2  2.0 V1UnitM
+  add D0, D23, D21  // ADD <V><d>, <V><n>, <V><m>  \\ ASIMD arith, basic  \\ 1 2  2  4.0 V1UnitV
+  add V19.4S, V24.4S, V15.4S  // ADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>  \\ ASIMD arith, basic  \\ 1 2  2  4.0 V1UnitV
+  add Z29.D, P5/M, Z29.D, Z29.D  // ADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>  \\ Arithmetic, basic  \\ 1 2  2  2.0 V1UnitV01
+  add Z10.H, Z22.H, Z13.H  // ADD <Zd>.<T>, <Zn>.<T>, <Zm>.<T>  \\ Arithmetic, basic  \\ 1 2  2  2.0 V1UnitV01
+  addhn V26.4H, V5.4S, V9.4S  // ADDHN <Vd>.<Tb>, <Vn>.<Ta>, <Vm>.<Ta>  \\ ASIMD arith, complex  \\ 1 2  2  4.0 V1UnitV
+  addhn2 V1.16B, V19.8H, V6.8H  // ADDHN2 <Vd>.<Tb>, <Vn>.<Ta>, <Vm>.<Ta>  \\ ASIMD arith, complex  \\ 1 2  2  4.0 V1UnitV
+  addp D1, V14.2D  // ADDP <V><d>, <Vn>.<T>  \\ ASIMD arith, pair-wise  \\ 1 2  2  4.0 V1UnitV
+  addp V7.2S, V1.2S, V2.2S  // ADDP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>  \\ ASIMD arith, pair-wise  \\ 1 2  2  4.0 V1UnitV
+  addpl X27, X6, #-6  // ADDPL <Xd|SP>, <Xn|SP>, #<imm>  \\ Predicate counting scalar  \\ 1 2  2  1.0 V1UnitM0
+  adds W17, WSP, W25  // ADDS <Wd>, <Wn|WSP>, <Wm>  \\ ALU, basic, unconditional, flagset  \\ 1 2  2  2.00 V1UnitI,V1UnitFlg
+  adds W6, WSP, W15, UXTH   // ADDS <Wd>, <Wn|WSP>, <Wm>, <wextend>   \\ ALU, basic, unconditional, flagset  \\ 1 2  2  2.00 V1UnitI,V1UnitFlg
+  adds W22, WSP, W30, UXTB #2  // ADDS <Wd>, <Wn|WSP>, <Wm>, <wextend> #<amount>  \\ ALU, basic, unconditional, flagset  \\ 1 1  1  3.00 V1UnitI,V1UnitFlg
+  adds W12, WSP, W29, LSL #4  // ADDS <Wd>, <Wn|WSP>, <Wm>, LSL #<amount>  \\ Arithmetic, LSL shift by immed, shift <= 4, unconditional, flagset   \\ 1 2  2  2.00 V1UnitI,V1UnitFlg
+  adds X14, X0, X10  // ADDS <Xd>, <Xn|SP>, X<m>  \\ ALU, basic, flagset  \\ 1 1  1  3.00 V1UnitI,V1UnitFlg
+  adds X13, X23, W8, UXTB  // ADDS <Xd>, <Xn|SP>, <R><m>, <extend>  \\ ALU, basic, flagset  \\ 1 1  1  3.00 V1UnitI,V1UnitFlg
+  adds X4, X26, W28, UXTB #1  // ADDS <Xd>, <Xn|SP>, <R><m>, <extend> #<amount>  \\ ALU, flagset, extend and shift  \\ 1 1  1  3.00 V1UnitFlg, V1UnitI
+  adds X10, X3, X29, LSL #2  // ADDS <Xd>, <Xn|SP>, X<m>, LSL #<amount>  \\ Arithmetic, flagset, LSL shift, shift <= 4  \\ 1 1   1   3.00 V1UnitI,V1UnitFlg
+  adds W23, WSP, #502  // ADDS <Wd>, <Wn|WSP>, #<imm>  \\ ALU, basic, unconditional, flagset  \\ 1 1  1  3.00 V1UnitI,V1UnitFlg
+  adds W2, WSP, #2980, LSL #12  // ADDS <Wd>, <Wn|WSP>, #<imm>, <shift>  \\ Arithmetic, flagset, LSR/ASR/ROR shift by immed or LSL shift by immed > 4, unconditional  \\ 1 1  1  3.00 V1UnitFlg, V1UnitI
+  adds X12, X4, #1345  // ADDS <Xd>, <Xn|SP>, #<imm>  \\ ALU, basic, flagset  \\ 1 1  1  3.00 V1UnitI,V1UnitFlg
+  adds X25, X18, #3037, LSL #12  // ADDS <Xd>, <Xn|SP>, #<imm>, <shift>  \\ Arithmetic, flagset, LSR/ASR/ROR shift or LSL shift > 4  \\ 1 1  1  3.00 V1UnitFlg, V1UnitI
+  adds W12, W13, W26  // ADDS <Wd>, <Wn>, <Wm>  \\ ALU, basic, unconditional, flagset  \\ 1 1  1  3.00 V1UnitI,V1UnitFlg
+  adds W0, W23, W20, LSL #0  // ADDS <Wd>, <Wn>, <Wm>, LSL #<wamountl>  \\ Arithmetic, LSL shift by immed, shift <= 4, unconditional, flagset   \\ 1 1  1  3.00 V1UnitI,V1UnitFlg
+  adds W13, W16, W12, LSL #28  // ADDS <Wd>, <Wn>, <Wm>, LSL #<wamounth>  \\ Arithmetic, flagset, LSR/ASR/ROR shift by immed or LSL shift by immed > 4, unconditional  \\ 1 2  2  2.00 V1UnitM,V1UnitFlg
+  adds W20, W19, W16, ASR #0  // ADDS <Wd>, <Wn>, <Wm>, <shift> #<wamount>  \\ Arithmetic, flagset, LSR/ASR/ROR shift by immed or LSL shift by immed > 4, unconditional  \\ 1 2  2  2.00 V1UnitM,V1UnitFlg
+  adds X23, X12, X4  // ADDS <Xd>, <Xn>, <Xm>  \\ ALU, basic, flagset  \\ 1 1  1  3.00 V1UnitI,V1UnitFlg
+  adds X0, X13, X4, LSL #2  // ADDS <Xd>, <Xn>, <Xm>, LSL #<amountl>  \\ Arithmetic, flagset, LSL shift, shift <= 4  \\ 1 1   1   3.00 V1UnitI,V1UnitFlg
+  adds X4, X7, X6, LSL #31  // ADDS <Xd>, <Xn>, <Xm>, LSL #<amounth>  \\ Arithmetic, flagset, LSR/ASR/ROR shift or LSL shift > 4  \\ 1 2  2  2.00 V1UnitM,V1UnitFlg
+  adds X9, X8, X9, ASR #41  // ADDS <Xd>, <Xn>, <Xm>, <shift> #<amount>  \\ Arithmetic, flagset, LSR/ASR/ROR shift or LSL shift > 4  \\ 1 2  2  2.00 V1UnitM,V1UnitFlg
+  addv B0, V28.8B  // ADDV B<d>, <Vn>.8B  \\ ASIMD arith, reduce, 8B/8H  \\ 2 4  4  2.00 V1UnitV13
+  addv B1, V26.16B  // ADDV B<d>, <Vn>.16B  \\ ASIMD arith, reduce, 16B  \\ 2 4  4  1.00 V1UnitV13[2]
+  addv H18, V13.4H  // ADDV H<d>, <Vn>.4H  \\ ASIMD arith, reduce, 4H/4S  \\ 1 2  2  2.0 V1UnitV13
+  addv H29, V17.8H  // ADDV H<d>, <Vn>.8H  \\ ASIMD arith, reduce, 8B/8H  \\ 2 4  4  2.00 V1UnitV13
+  addv S22, V18.4S  // ADDV S<d>, <Vn>.4S  \\ ASIMD arith, reduce, 4H/4S  \\ 1 2  2  2.0 V1UnitV13
+  addvl X1, X27, #-8  // ADDVL <Xd|SP>, <Xn|SP>, #<imm>  \\ Predicate counting scalar  \\ 1 2  2  1.0 V1UnitM0
+  adr X3, test  // ADR <Xd>, <label>  \\ Address generation  \\ 1 1  1  4.0 V1UnitI
+  adr Z26.D, [Z1.D, Z8.D]  // ADR <Zd>.<T>, [<Zn>.<T>, <Zm>.<T>]  \\ Arithmetic, basic  \\ 1 2  2  2.0 V1UnitV01
+  adr Z22.S, [Z28.S, Z8.S, LSL #2]  // ADR <Zd>.<T>, [<Zn>.<T>, <Zm>.<T>, <mod> #<amount>]  \\ Arithmetic, basic  \\ 1 2  2  2.0 V1UnitV01
+  adr Z11.D, [Z2.D, Z29.D, SXTW ]  // ADR <Zd>.D, [<Zn>.D, <Zm>.D, SXTW ]  \\ Arithmetic, basic  \\ 1 2  2  2.0 V1UnitV01
+  adr Z3.D, [Z9.D, Z9.D, SXTW #2]  // ADR <Zd>.D, [<Zn>.D, <Zm>.D, SXTW #<amount>]  \\ Arithmetic, basic  \\ 1 2  2  2.0 V1UnitV01
+  adr Z6.D, [Z7.D, Z13.D, UXTW ]  // ADR <Zd>.D, [<Zn>.D, <Zm>.D, UXTW ]  \\ Arithmetic, basic  \\ 1 2  2  2.0 V1UnitV01
+  adr Z4.D, [Z24.D, Z22.D, UXTW #1]  // ADR <Zd>.D, [<Zn>.D, <Zm>.D, UXTW #<amount>]  \\ Arithmetic, basic  \\ 1 2  2  2.0 V1UnitV01
+  adrp X0, test  // ADRP <Xd>, <label>  \\ Address generation  \\ 1 1  1  4.0 V1UnitI
+  and WSP, W16, #0xe00  // AND <Wd|WSP>, <Wn>, #<imms>  \\ ALU, basic  \\ 1 1  1  4.0 V1UnitI
+  and X2, X22, #0x1e00  // AND <Xd|SP>, <Xn>, #<imm>  \\ ALU, basic  \\ 1 1  1  4.0 V1UnitI
+  and Z1.B, Z1.B, #0x70  // AND <Zdn>.B, <Zdn>.B, #<constb>  \\ Logical  \\ 1 2  2  2.0 V1UnitV01
+  and Z7.H, Z7.H, #0x60  // AND <Zdn>.H, <Zdn>.H, #<consth>  \\ Logical  \\ 1 2  2  2.0 V1UnitV01
+  and Z7.S, Z7.S, #0x2  // AND <Zdn>.S, <Zdn>.S, #<consts>  \\ Logical  \\ 1 2  2  2.0 V1UnitV01
+  and Z7.D, Z7.D, #0x4  // AND <Zdn>.D, <Zdn>.D, #<constd>  \\ Logical  \\ 1 2  2  2.0 V1UnitV01
+  and P5.B, P1/Z, P6.B, P4.B  // AND <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B  \\ Predicate logical  \\ 1 1  1  1.0 V1UnitM0
+  and W11, W14, W24  // AND <Wd>, <Wn>, <Wm>  \\ Logical, shift, no flagset  \\ 1 1  1  4.0 V1UnitI
+  and W2, W21, W22, LSR #25  // AND <Wd>, <Wn>, <Wm>, <shift> #<wamount>  \\ Logical, shift, no flagset  \\ 1 1  1  4.0 V1UnitI
+  and X1, X20, X29  // AND <Xd>, <Xn>, <Xm>  \\ Logical, shift, no flagset  \\ 1 1  1  4.0 V1UnitI
+  and X8, X11, X22, ASR #56  // AND <Xd>, <Xn>, <Xm>, <shift> #<amount>  \\ Logical, shift, no flagset  \\ 1 1  1  4.0 V1UnitI
+  and V29.8B, V26.8B, V26.8B  // AND <Vd>.<T>, <Vn>.<T>, <Vm>.<T>  \\ ASIMD logical  \\ 1 2  2  4.0 V1UnitV
+  and Z17.D, P6/M, Z17.D, Z12.D  // AND <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>  \\ Logical  \\ 1 2  2  2.0 V1UnitV01
+  and Z9.D, Z5.D, Z17.D  // AND <Zd>.D, <Zn>.D, <Zm>.D  \\ Logical  \\ 1 2  2  2.0 V1UnitV01
+  ands W14, W8, #0x70  // ANDS <Wd>, <Wn>, #<imms>  \\ ALU, basic, unconditional, flagset  \\ 1 1  1  3.00 V1UnitI,V1UnitFlg
+  ands X4, X10, #0x60  // ANDS <Xd>, <Xn>, #<immd>  \\ ALU, basic, flagset  \\ 1 1  1  3.00 V1UnitI,V1UnitFlg
+  ands W29, W28, W12  // ANDS <Wd>, <Wn>, <Wm>  \\ ALU, basic, unconditional, flagset  \\ 1 2  2  2.00 V1UnitI,V1UnitFlg
+  ands W7, W13, W23, ASR #3  // ANDS <Wd>, <Wn>, <Wm>, <shift> #<wamount>  \\ Logical, shift by immed, flagset, unconditional  \\ 1 2  2  2.00 V1UnitM,V1UnitFlg
+  ands X21, X9, X6  // ANDS <Xd>, <Xn>, <Xm>  \\ ALU, basic, flagset  \\ 1 2  2  2.00 V1UnitI,V1UnitFlg
+  ands X10, X27, X7, ASR #20  // ANDS <Xd>, <Xn>, <Xm>, <shift> #<amount>  \\ Logical, shift, flagset  \\ 1 2  2  2.00 V1UnitM,V1UnitFlg
+  ands P5.B, P1/Z, P2.B, P7.B  // ANDS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B  \\ Predicate logical, flag setting  \\ 2 2  2  0.50 V1UnitM0[2]
+  andv H7, P6, Z31.H  // ANDV <V><d>, <Pg>, <Zn>.<T>  \\ Reduction, logical   \\ 4 12  12  0.50 V1UnitV01[4]
+  asr W30, W14, #5  // ASR <Wd>, <Wn>, #<shifts>  \\ Move, shift by immed, no flagset  \\ 1 1  1  4.0 V1UnitI
+  asr X12, X21, #28  // ASR <Xd>, <Xn>, #<shiftd>  \\ Move, shift by immed, no flagset  \\ 1 1  1  4.0 V1UnitI
+  asr Z7.B, P5/M, Z7.B, #3  // ASR <Zdn>.B, <Pg>/M, <Zdn>.B, #<constb>  \\ Arithmetic, shift  \\ 1 2  2  1.0 V1UnitV1
+  asr Z6.H, P6/M, Z6.H, #5  // ASR <Zdn>.H, <Pg>/M, <Zdn>.H, #<consth>  \\ Arithmetic, shift  \\ 1 2  2  1.0 V1UnitV1
+  asr Z28.S, P0/M, Z28.S, #11  // ASR <Zdn>.S, <Pg>/M, <Zdn>.S, #<consts>  \\ Arithmetic, shift  \\ 1 2  2  1.0 V1UnitV1
+  asr Z26.D, P5/M, Z26.D, #24  // ASR <Zdn>.D, <Pg>/M, <Zdn>.D, #<constd>  \\ Arithmetic, shift  \\ 1 2  2  1.0 V1UnitV1
+  asr Z10.B, Z14.B, #3  // ASR <Zd>.B, <Zn>.B, #<constb>  \\ Arithmetic, shift  \\ 1 2  2  1.0 V1UnitV1
+  asr Z23.H, Z18.H, #6  // ASR <Zd>.H, <Zn>.H, #<consth>  \\ Arithmetic, shift  \\ 1 2  2  1.0 V1UnitV1
+  asr Z29.S, Z11.S, #6  // ASR <Zd>.S, <Zn>.S, #<consts>  \\ Arithmetic, shift  \\ 1 2  2  1.0 V1UnitV1
+  asr Z20.D, Z26.D, #29  // ASR <Zd>.D, <Zn>.D, #<constd>  \\ Arithmetic, shift  \\ 1 2  2  1.0 V1UnitV1
+  asr W3, W0, W20  // ASR <Wd>, <Wn>, <Wm>  \\ Move, shift by register, no flagset, unconditional  \\ 1 1  1  4.0 V1UnitI
+  asr X7, X5, X21  // ASR <Xd>, <Xn>, <Xm>  \\ Move, shift by register, no flagset, unconditional  \\ 1 1  1  4.0 V1UnitI
+  asr Z3.S, P0/M, Z3.S, Z10.S  // ASR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>  \\ Arithmetic, shift  \\ 1 2  2  1.0 V1UnitV1
+  asr Z9.S, P2/M, Z9.S, Z8.D  // ASR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.D  \\ Arithmetic, shift  \\ 1 2  2  1.0 V1UnitV1
+  asr Z26.S, Z21.S, Z21.D  // ASR <Zd>.<T>, <Zn>.<T>, <Zm>.D  \\ Arithmetic, shift  \\ 1 2  2  1.0 V1UnitV1
+  asrd Z6.B, P4/M, Z6.B, #2  // ASRD <Zdn>.B, <Pg>/M, <Zdn>.B, #<constb>  \\ Arithmetic, shift right for divide  \\ 1 4  4  1.0 V1UnitV1
+  asrd Z19.H, P3/M, Z19.H, #6  // ASRD <Zdn>.H, <Pg>/M, <Zdn>.H, #<consth>  \\ Arithmetic, shift right for divide  \\ 1 4  4  1.0 V1UnitV1
+  asrd Z16.S, P3/M, Z16.S, #2  // ASRD <Zdn>.S, <Pg>/M, <Zdn>.S, #<consts>  \\ Arithmetic, shift right for divide  \\ 1 4  4  1.0 V1UnitV1
+  asrd Z9.D, P6/M, Z9.D, #12  // ASRD <Zdn>.D, <Pg>/M, <Zdn>.D, #<constd>  \\ Arithmetic, shift right for divide  \\ 1 4  4  1.0 V1UnitV1
+  asrr Z0.B, P0/M, Z0.B, Z19.B  // ASRR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>  \\ Arithmetic, shift  \\ 1 2  2  1.0 V1UnitV1
+  asrv W24, W28, W13  // ASRV <Wd>, <Wn>, <Wm>  \\ Variable shift  \\ 1 1  1  4.0 V1UnitI
+  asrv X3, X21, X24  // ASRV <Xd>, <Xn>, <Xm>  \\ Variable shift  \\ 1 1  1  4.0 V1UnitI
+  at s12e1r, X28  // AT <at_op>, <Xt>  \\ No description \\ No scheduling info
+  b test  // B <label>  \\ Branch, immed  \\ 1 1  1  2.0 V1UnitB
+  b.eq test // B.eq <label> \\ Branch, immed  \\ 1 1  1  2.0 V1UnitB
+  b.none test // B.none <label> \\ Branch, immed  \\ 1 1  1  2.0 V1UnitB
+  b.ne test // B.ne <label> \\ Branch, immed  \\ 1 1  1  2.0 V1UnitB
+  b.any test // B.any <label> \\ Branch, immed  \\ 1 1  1  2.0 V1UnitB
+  b.cs test // B.cs <label> \\ Branch, immed  \\ 1 1  1  2.0 V1UnitB
+  b.hs test // B.hs <label> \\ Branch, immed  \\ 1 1  1  2.0 V1UnitB
+  b.nlast test // B.nlast <label> \\ Branch, immed  \\ 1 1  1  2.0 V1UnitB
+  b.cc test // B.cc <label> \\ Branch, immed  \\ 1 1  1  2.0 V1UnitB
+  b.lo test // B.lo <label> \\ Branch, immed  \\ 1 1  1  2.0 V1UnitB
+  b.last test // B.last <label> \\ Branch, immed  \\ 1 1  1  2.0 V1UnitB
+  b.mi test // B.mi <label> \\ Branch, immed  \\ 1 1  1  2.0 V1UnitB
+  b.first test // B.first <label> \\ Branch, immed  \\ 1 1  1  2.0 V1UnitB
+  b.pl test // B.pl <label> \\ Branch, immed  \\ 1 1  1  2.0 V1UnitB
+  b.nfrst test // B.nfrst <label> \\ Branch, immed  \\ 1 1  1  2.0 V1UnitB
+  b.vs test // B.vs <label> \\ Branch, immed  \\ 1 1  1  2.0 V1UnitB
+  b.vc test // B.vc <label> \\ Branch, immed  \\ 1 1  1  2.0 V1UnitB
+  b.hi test // B.hi <label> \\ Branch, immed  \\ 1 1  1  2.0 V1UnitB
+  b.pmore test // B.pmore <label> \\ Branch, immed  \\ 1 1  1  2.0 V1UnitB
+  b.ls test // B.ls <label> \\ Branch, immed  \\ 1 1  1  2.0 V1UnitB
+  b.plast test // B.plast <label> \\ Branch, immed  \\ 1 1  1  2.0 V1UnitB
+  b.ge test // B.ge <label> \\ Branch, immed  \\ 1 1  1  2.0 V1UnitB
+  b.tcont test // B.tcont <label> \\ Branch, immed  \\ 1 1  1  2.0 V1UnitB
+  b.lt test // B.lt <label> \\ Branch, immed  \\ 1 1  1  2.0 V1UnitB
+  b.tstop test // B.tstop <label> \\ Branch, immed  \\ 1 1  1  2.0 V1UnitB
+  b.gt test // B.gt <label> \\ Branch, immed  \\ 1 1  1  2.0 V1UnitB
+  b.le test // B.le <label> \\ Branch, immed  \\ 1 1  1  2.0 V1UnitB
+  b.al test // B.al <label> \\ Branch, immed  \\ 1 1  1  2.0 V1UnitB
+  b.nv test // B.nv <label> \\ Branch, immed  \\ 1 1  1  2.0 V1UnitB
+  bfcvt H6, S20  // BFCVT <Hd>, <Sn>  \\ Scalar convert, F32 to BF16  \\ 1 3  3  2.0 V1UnitV02
+  bfcvt Z16.H, P6/M,...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/126703