[llvm] [MCA] New option to report scheduling information: -scheduling-info (PR #126703)
via llvm-commits
llvm-commits at lists.llvm.org
Tue Feb 11 01:43:09 PST 2025
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-llvm-binary-utilities
Author: Julien Villette (jvillette38)
<details>
<summary>Changes</summary>
This is a new way to update scheduling information in llvm. I have used this to update scheduling information for AArch64 Neoverse V1 micro architecture (new patches will follow and will be dependent to this pull request).
This pull request contains 2 commits:
A) `llvm-mca -scheduling-info` option
B) `update_mca_test_checks.py` new options: `--check-sched-info` and `--update-sched-info`.
A) `llvm-mca -scheduling-info` disables default llvm-mca reporting (InstructionInfoView) and output information in the following format:
`
<uOps> | <Latency> | <Bypass Latency> | <Throughput> | <Resources> | <LLVM Opcode> | <Assembly input: instruction + comment>
`
Example from new llvm-mca test `AArch64/Neoverse/V1-scheduling-info.s`:
Input:
`
abs v25.2s, v25.2s // ABS <Vd>.<T>, <Vn>.<T> \\ ASIMD arith, basic \\ 1 2 2 4.0 V1UnitV
`
Output:
`
1 | 2 | 2 | 4.00 | V1UnitSVE01, V1UnitV | ABSv2i32 | abs v25.2s, v25.2s // ABS <Vd>.<T>, <Vn>.<T> \\ ASIMD arith, basic \\ 1 2 2 4.0 V1UnitV
`
So if we are able to extract scheduling information from micro architecture document for each instruction variant, it is possible to write test in this form and check `llvm-mca -scheduling-info` output for the differences between llvm information compared to the one in comments. If you get differences, check the documentation to update comment or fix llvm to update llvm-mca output.
LLVM Opcode is given to make easier the changes in target description.
B) `update_mca_test_checks.py --check-sched-info` is used to check informations between `llvm-mca` output and information in comments. If found differences, it will exit with error code and report them. Developer can fix comments or llvm target description or use `update_mca_test_checks.py --update-sched-info` to update automatically comments and then check differences with git.
Convention for comments used by new update_mca_test_checks.py options:
- C or C++ style comment: '/* */' and '//'
- Fields:
`
<asm instruction> <// or /*> <instruction format> \\ <micro architecture reference> \\ <uOps> <Latency> <Bypass latency> <Throughput> <Resources seperated with commas>
`
@<!-- -->mshockwave and @<!-- -->Rin18 may be interested.
---
Patch is 1.49 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/126703.diff
10 Files Affected:
- (modified) llvm/docs/CommandGuide/llvm-mca.rst (+14)
- (modified) llvm/include/llvm/MC/MCSchedule.h (+4)
- (modified) llvm/lib/MC/MCSchedule.cpp (+37)
- (added) llvm/test/tools/llvm-mca/AArch64/Neoverse/V1-scheduling-info.s (+7588)
- (modified) llvm/tools/llvm-mca/CMakeLists.txt (+1)
- (modified) llvm/tools/llvm-mca/Views/InstructionInfoView.h (+1)
- (added) llvm/tools/llvm-mca/Views/SchedulingInfoView.cpp (+210)
- (added) llvm/tools/llvm-mca/Views/SchedulingInfoView.h (+96)
- (modified) llvm/tools/llvm-mca/llvm-mca.cpp (+31-11)
- (modified) llvm/utils/update_mca_test_checks.py (+168)
``````````diff
diff --git a/llvm/docs/CommandGuide/llvm-mca.rst b/llvm/docs/CommandGuide/llvm-mca.rst
index f610ea2f2168269..1c5275ce000b111 100644
--- a/llvm/docs/CommandGuide/llvm-mca.rst
+++ b/llvm/docs/CommandGuide/llvm-mca.rst
@@ -170,6 +170,20 @@ option specifies "``-``", then the output will also be sent to standard output.
Enable extra scheduler statistics. This view collects and analyzes instruction
issue events. This view is disabled by default.
+.. option:: -scheduling-info
+
+ Enable scheduling info view. This view reports scheduling information defined
+ in LLVM target description in the form:
+ uOps | Latency | Bypass Latency | Throughput | LLVM OpcodeName | Resources
+ units | assembly instruction and its comment (// or /* */) if defined.
+ It allows to compare scheduling info with architecture documents and fix them
+ in target description by fixing InstrRW for the reported LLVM opcode.
+ Scheduling information can be defined in the same order in each instruction
+ comments to check easily reported and reference scheduling information.
+ Suggested information in comment:
+ // <architecture instruction form> \\ <scheduling documentation title> \\
+ <uOps>, <Latency>, <Bypass Latency>, <Throughput>, <Resources units>
+
.. option:: -retire-stats
Enable extra retire control unit statistics. This view is disabled by default.
diff --git a/llvm/include/llvm/MC/MCSchedule.h b/llvm/include/llvm/MC/MCSchedule.h
index fe731d086f70ae3..dcbc5369120a39b 100644
--- a/llvm/include/llvm/MC/MCSchedule.h
+++ b/llvm/include/llvm/MC/MCSchedule.h
@@ -402,6 +402,10 @@ struct MCSchedModel {
static unsigned getForwardingDelayCycles(ArrayRef<MCReadAdvanceEntry> Entries,
unsigned WriteResourceIdx = 0);
+ /// Returns the maximum forwarding delay for maximum write latency.
+ static unsigned getForwardingDelayCycles(const MCSubtargetInfo &STI,
+ const MCSchedClassDesc &SCDesc);
+
/// Returns the default initialized model.
static const MCSchedModel Default;
};
diff --git a/llvm/lib/MC/MCSchedule.cpp b/llvm/lib/MC/MCSchedule.cpp
index ed243cecabb7638..4ef6acf78714fa7 100644
--- a/llvm/lib/MC/MCSchedule.cpp
+++ b/llvm/lib/MC/MCSchedule.cpp
@@ -174,3 +174,40 @@ MCSchedModel::getForwardingDelayCycles(ArrayRef<MCReadAdvanceEntry> Entries,
return std::abs(DelayCycles);
}
+
+unsigned
+MCSchedModel::getForwardingDelayCycles(const MCSubtargetInfo &STI,
+ const MCSchedClassDesc &SCDesc) {
+
+ ArrayRef<MCReadAdvanceEntry> Entries = STI.getReadAdvanceEntries(SCDesc);
+ if (Entries.empty())
+ return 0;
+
+ unsigned Latency = 0;
+ unsigned maxLatency = 0;
+ unsigned WriteResourceID = 0;
+ unsigned DefEnd = SCDesc.NumWriteLatencyEntries;
+
+ for (unsigned DefIdx = 0; DefIdx != DefEnd; ++DefIdx) {
+ // Lookup the definition's write latency in SubtargetInfo.
+ const MCWriteLatencyEntry *WLEntry =
+ STI.getWriteLatencyEntry(&SCDesc, DefIdx);
+ // Early exit if we found an invalid latency.
+ // Consider no bypass
+ if (WLEntry->Cycles < 0)
+ return 0;
+ maxLatency = std::max(Latency, static_cast<unsigned>(WLEntry->Cycles));
+ if (maxLatency > Latency) {
+ WriteResourceID = WLEntry->WriteResourceID;
+ }
+ Latency = maxLatency;
+ }
+
+ for (const MCReadAdvanceEntry &E : Entries) {
+ if (E.WriteResourceID == WriteResourceID) {
+ return E.Cycles;
+ }
+ }
+
+ llvm_unreachable("WriteResourceID not found in MCReadAdvanceEntry entries");
+}
diff --git a/llvm/test/tools/llvm-mca/AArch64/Neoverse/V1-scheduling-info.s b/llvm/test/tools/llvm-mca/AArch64/Neoverse/V1-scheduling-info.s
new file mode 100644
index 000000000000000..c421166f22ea45e
--- /dev/null
+++ b/llvm/test/tools/llvm-mca/AArch64/Neoverse/V1-scheduling-info.s
@@ -0,0 +1,7588 @@
+# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
+# RUN: llvm-mca -mtriple=aarch64 -mcpu=neoverse-v1 -scheduling-info < %s | FileCheck %s
+
+ .text
+ .file "V1-scheduling-info.s"
+ .globl test
+ .p2align 4
+ .type test, at function
+test:
+ .cfi_startproc
+ abs D15, D11 /* ABS <V><d>, <V><n> \\ ASIMD arith, basic \\ 1 2 2 4.0 V1UnitV */
+ abs V25.2S, V25.2S // ABS <Vd>.<T>, <Vn>.<T> \\ ASIMD arith, basic \\ 1 2 2 4.0 V1UnitV
+ abs Z26.B, P6/M, Z27.B // ABS <Zd>.<T>, <Pg>/M, <Zn>.<T> \\ Arithmetic, basic \\ 1 2 2 2.0 V1UnitV01
+ adc W13, W6, W4 // ADC <Wd>, <Wn>, <Wm> \\ ALU, basic \\ 1 1 1 4.0 V1UnitI
+ adc X8, X12, X10 // ADC <Xd>, <Xn>, <Xm> \\ ALU, basic \\ 1 1 1 4.0 V1UnitI
+ adcs W29, W7, W30 // ADCS <Wd>, <Wn>, <Wm> \\ ALU, basic, flagset \\ 1 1 1 3.00 V1UnitI,V1UnitFlg
+ adcs X11, X3, X5 // ADCS <Xd>, <Xn>, <Xm> \\ ALU, basic, flagset \\ 1 1 1 3.00 V1UnitI,V1UnitFlg
+ add WSP, WSP, W10 // ADD <Wd|WSP>, <Wn|WSP>, <Wm> \\ ALU, basic, unconditional, no flagset \\ 1 2 2 2.00 V1UnitI
+ add WSP, WSP, W2, UXTB // ADD <Wd|WSP>, <Wn|WSP>, <Wm>, <wextend> \\ ALU, basic, unconditional, no flagset \\ 1 2 2 2.00 V1UnitI
+ add WSP, WSP, W13, UXTH #4 // ADD <Wd|WSP>, <Wn|WSP>, <Wm>, <wextend> #<amount> \\ ALU, basic, unconditional, no flagset \\ 1 2 2 2.00 V1UnitI
+ add WSP, WSP, W13, LSL #4 // ADD <Wd|WSP>, <Wn|WSP>, <Wm>, LSL #<amount> \\ Arithmetic, LSL shift, shift <= 4 \\ 1 2 2 2.00 V1UnitI
+ add X22, X2, X27 // ADD <Xd|SP>, <Xn|SP>, X<m> \\ ALU, basic \\ 1 1 1 4.0 V1UnitI
+ add X25, X9, W25, UXTB // ADD <Xd|SP>, <Xn|SP>, <R><m>, <extend> \\ ALU, basic \\ 1 2 2 2.00 V1UnitI
+ add X4, X28, W3, UXTB #3 // ADD <Xd|SP>, <Xn|SP>, <R><m>, <extend> #<amount> \\ ALU, extend and shift \\ 1 2 2 2.0 V1UnitM
+ add X0, X28, X26, LSL #3 // ADD <Xd|SP>, <Xn|SP>, X<m>, LSL #<amount> \\ Arithmetic, LSL shift, shift <= 4 \\ 1 1 1 4.0 V1UnitI
+ add WSP, WSP, #3765 // ADD <Wd|WSP>, <Wn|WSP>, #<imm> \\ ALU, basic \\ 1 1 1 4.0 V1UnitI
+ add WSP, WSP, #3547, LSL #12 // ADD <Wd|WSP>, <Wn|WSP>, #<imm>, <shift> \\ ALU, basic \\ 1 1 1 4.0 V1UnitI
+ add X7, X30, #803 // ADD <Xd|SP>, <Xn|SP>, #<imm> \\ ALU, basic \\ 1 1 1 4.0 V1UnitI
+ add X7, X2, #319, LSL #12 // ADD <Xd|SP>, <Xn|SP>, #<imm>, <shift> \\ ALU, basic \\ 1 1 1 4.0 V1UnitI
+ add Z13.D, Z13.D, #245 // ADD <Zdn>.<T>, <Zdn>.<T>, #<imm> \\ Arithmetic, basic \\ 1 2 2 2.0 V1UnitV01
+ add Z16.D, Z16.D, #233, LSL #8 // ADD <Zdn>.<T>, <Zdn>.<T>, #<imm>, <shift> \\ Arithmetic, basic \\ 1 2 2 2.0 V1UnitV01
+ add W3, W2, W21, LSL #3 // ADD <Wd>, <Wn>, <Wm>, LSL #<wamountl> \\ Arithmetic, LSL shift by immed, shift <= 4, unconditional, no flagset \\ 1 1 1 4.0 V1UnitI
+ add W6, W21, W17, LSL #15 // ADD <Wd>, <Wn>, <Wm>, LSL #<wamounth> \\ Arithmetic, LSR/ASR/ROR shift by immed or LSL shift by immed > 4, unconditional \\ 1 2 2 2.0 V1UnitM
+ add W28, W30, W19, ASR #30 // ADD <Wd>, <Wn>, <Wm>, <shift> #<wamount> \\ Arithmetic, LSR/ASR/ROR shift by immed or LSL shift by immed > 4, unconditional \\ 1 2 2 2.0 V1UnitM
+ add X8, X3, X28, LSL #3 // ADD <Xd>, <Xn>, <Xm>, LSL #<amountl> \\ Arithmetic, LSL shift, shift <= 4 \\ 1 1 1 4.0 V1UnitI
+ add X12, X13, X0, LSL #44 // ADD <Xd>, <Xn>, <Xm>, LSL #<amounth> \\ Arithmetic, LSR/ASR/ROR shift or LSL shift > 4 \\ 1 2 2 2.0 V1UnitM
+ add X5, X20, X28, LSR #16 // ADD <Xd>, <Xn>, <Xm>, <shift> #<amount> \\ Arithmetic, LSR/ASR/ROR shift or LSL shift > 4 \\ 1 2 2 2.0 V1UnitM
+ add D0, D23, D21 // ADD <V><d>, <V><n>, <V><m> \\ ASIMD arith, basic \\ 1 2 2 4.0 V1UnitV
+ add V19.4S, V24.4S, V15.4S // ADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T> \\ ASIMD arith, basic \\ 1 2 2 4.0 V1UnitV
+ add Z29.D, P5/M, Z29.D, Z29.D // ADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T> \\ Arithmetic, basic \\ 1 2 2 2.0 V1UnitV01
+ add Z10.H, Z22.H, Z13.H // ADD <Zd>.<T>, <Zn>.<T>, <Zm>.<T> \\ Arithmetic, basic \\ 1 2 2 2.0 V1UnitV01
+ addhn V26.4H, V5.4S, V9.4S // ADDHN <Vd>.<Tb>, <Vn>.<Ta>, <Vm>.<Ta> \\ ASIMD arith, complex \\ 1 2 2 4.0 V1UnitV
+ addhn2 V1.16B, V19.8H, V6.8H // ADDHN2 <Vd>.<Tb>, <Vn>.<Ta>, <Vm>.<Ta> \\ ASIMD arith, complex \\ 1 2 2 4.0 V1UnitV
+ addp D1, V14.2D // ADDP <V><d>, <Vn>.<T> \\ ASIMD arith, pair-wise \\ 1 2 2 4.0 V1UnitV
+ addp V7.2S, V1.2S, V2.2S // ADDP <Vd>.<T>, <Vn>.<T>, <Vm>.<T> \\ ASIMD arith, pair-wise \\ 1 2 2 4.0 V1UnitV
+ addpl X27, X6, #-6 // ADDPL <Xd|SP>, <Xn|SP>, #<imm> \\ Predicate counting scalar \\ 1 2 2 1.0 V1UnitM0
+ adds W17, WSP, W25 // ADDS <Wd>, <Wn|WSP>, <Wm> \\ ALU, basic, unconditional, flagset \\ 1 2 2 2.00 V1UnitI,V1UnitFlg
+ adds W6, WSP, W15, UXTH // ADDS <Wd>, <Wn|WSP>, <Wm>, <wextend> \\ ALU, basic, unconditional, flagset \\ 1 2 2 2.00 V1UnitI,V1UnitFlg
+ adds W22, WSP, W30, UXTB #2 // ADDS <Wd>, <Wn|WSP>, <Wm>, <wextend> #<amount> \\ ALU, basic, unconditional, flagset \\ 1 1 1 3.00 V1UnitI,V1UnitFlg
+ adds W12, WSP, W29, LSL #4 // ADDS <Wd>, <Wn|WSP>, <Wm>, LSL #<amount> \\ Arithmetic, LSL shift by immed, shift <= 4, unconditional, flagset \\ 1 2 2 2.00 V1UnitI,V1UnitFlg
+ adds X14, X0, X10 // ADDS <Xd>, <Xn|SP>, X<m> \\ ALU, basic, flagset \\ 1 1 1 3.00 V1UnitI,V1UnitFlg
+ adds X13, X23, W8, UXTB // ADDS <Xd>, <Xn|SP>, <R><m>, <extend> \\ ALU, basic, flagset \\ 1 1 1 3.00 V1UnitI,V1UnitFlg
+ adds X4, X26, W28, UXTB #1 // ADDS <Xd>, <Xn|SP>, <R><m>, <extend> #<amount> \\ ALU, flagset, extend and shift \\ 1 1 1 3.00 V1UnitFlg, V1UnitI
+ adds X10, X3, X29, LSL #2 // ADDS <Xd>, <Xn|SP>, X<m>, LSL #<amount> \\ Arithmetic, flagset, LSL shift, shift <= 4 \\ 1 1 1 3.00 V1UnitI,V1UnitFlg
+ adds W23, WSP, #502 // ADDS <Wd>, <Wn|WSP>, #<imm> \\ ALU, basic, unconditional, flagset \\ 1 1 1 3.00 V1UnitI,V1UnitFlg
+ adds W2, WSP, #2980, LSL #12 // ADDS <Wd>, <Wn|WSP>, #<imm>, <shift> \\ Arithmetic, flagset, LSR/ASR/ROR shift by immed or LSL shift by immed > 4, unconditional \\ 1 1 1 3.00 V1UnitFlg, V1UnitI
+ adds X12, X4, #1345 // ADDS <Xd>, <Xn|SP>, #<imm> \\ ALU, basic, flagset \\ 1 1 1 3.00 V1UnitI,V1UnitFlg
+ adds X25, X18, #3037, LSL #12 // ADDS <Xd>, <Xn|SP>, #<imm>, <shift> \\ Arithmetic, flagset, LSR/ASR/ROR shift or LSL shift > 4 \\ 1 1 1 3.00 V1UnitFlg, V1UnitI
+ adds W12, W13, W26 // ADDS <Wd>, <Wn>, <Wm> \\ ALU, basic, unconditional, flagset \\ 1 1 1 3.00 V1UnitI,V1UnitFlg
+ adds W0, W23, W20, LSL #0 // ADDS <Wd>, <Wn>, <Wm>, LSL #<wamountl> \\ Arithmetic, LSL shift by immed, shift <= 4, unconditional, flagset \\ 1 1 1 3.00 V1UnitI,V1UnitFlg
+ adds W13, W16, W12, LSL #28 // ADDS <Wd>, <Wn>, <Wm>, LSL #<wamounth> \\ Arithmetic, flagset, LSR/ASR/ROR shift by immed or LSL shift by immed > 4, unconditional \\ 1 2 2 2.00 V1UnitM,V1UnitFlg
+ adds W20, W19, W16, ASR #0 // ADDS <Wd>, <Wn>, <Wm>, <shift> #<wamount> \\ Arithmetic, flagset, LSR/ASR/ROR shift by immed or LSL shift by immed > 4, unconditional \\ 1 2 2 2.00 V1UnitM,V1UnitFlg
+ adds X23, X12, X4 // ADDS <Xd>, <Xn>, <Xm> \\ ALU, basic, flagset \\ 1 1 1 3.00 V1UnitI,V1UnitFlg
+ adds X0, X13, X4, LSL #2 // ADDS <Xd>, <Xn>, <Xm>, LSL #<amountl> \\ Arithmetic, flagset, LSL shift, shift <= 4 \\ 1 1 1 3.00 V1UnitI,V1UnitFlg
+ adds X4, X7, X6, LSL #31 // ADDS <Xd>, <Xn>, <Xm>, LSL #<amounth> \\ Arithmetic, flagset, LSR/ASR/ROR shift or LSL shift > 4 \\ 1 2 2 2.00 V1UnitM,V1UnitFlg
+ adds X9, X8, X9, ASR #41 // ADDS <Xd>, <Xn>, <Xm>, <shift> #<amount> \\ Arithmetic, flagset, LSR/ASR/ROR shift or LSL shift > 4 \\ 1 2 2 2.00 V1UnitM,V1UnitFlg
+ addv B0, V28.8B // ADDV B<d>, <Vn>.8B \\ ASIMD arith, reduce, 8B/8H \\ 2 4 4 2.00 V1UnitV13
+ addv B1, V26.16B // ADDV B<d>, <Vn>.16B \\ ASIMD arith, reduce, 16B \\ 2 4 4 1.00 V1UnitV13[2]
+ addv H18, V13.4H // ADDV H<d>, <Vn>.4H \\ ASIMD arith, reduce, 4H/4S \\ 1 2 2 2.0 V1UnitV13
+ addv H29, V17.8H // ADDV H<d>, <Vn>.8H \\ ASIMD arith, reduce, 8B/8H \\ 2 4 4 2.00 V1UnitV13
+ addv S22, V18.4S // ADDV S<d>, <Vn>.4S \\ ASIMD arith, reduce, 4H/4S \\ 1 2 2 2.0 V1UnitV13
+ addvl X1, X27, #-8 // ADDVL <Xd|SP>, <Xn|SP>, #<imm> \\ Predicate counting scalar \\ 1 2 2 1.0 V1UnitM0
+ adr X3, test // ADR <Xd>, <label> \\ Address generation \\ 1 1 1 4.0 V1UnitI
+ adr Z26.D, [Z1.D, Z8.D] // ADR <Zd>.<T>, [<Zn>.<T>, <Zm>.<T>] \\ Arithmetic, basic \\ 1 2 2 2.0 V1UnitV01
+ adr Z22.S, [Z28.S, Z8.S, LSL #2] // ADR <Zd>.<T>, [<Zn>.<T>, <Zm>.<T>, <mod> #<amount>] \\ Arithmetic, basic \\ 1 2 2 2.0 V1UnitV01
+ adr Z11.D, [Z2.D, Z29.D, SXTW ] // ADR <Zd>.D, [<Zn>.D, <Zm>.D, SXTW ] \\ Arithmetic, basic \\ 1 2 2 2.0 V1UnitV01
+ adr Z3.D, [Z9.D, Z9.D, SXTW #2] // ADR <Zd>.D, [<Zn>.D, <Zm>.D, SXTW #<amount>] \\ Arithmetic, basic \\ 1 2 2 2.0 V1UnitV01
+ adr Z6.D, [Z7.D, Z13.D, UXTW ] // ADR <Zd>.D, [<Zn>.D, <Zm>.D, UXTW ] \\ Arithmetic, basic \\ 1 2 2 2.0 V1UnitV01
+ adr Z4.D, [Z24.D, Z22.D, UXTW #1] // ADR <Zd>.D, [<Zn>.D, <Zm>.D, UXTW #<amount>] \\ Arithmetic, basic \\ 1 2 2 2.0 V1UnitV01
+ adrp X0, test // ADRP <Xd>, <label> \\ Address generation \\ 1 1 1 4.0 V1UnitI
+ and WSP, W16, #0xe00 // AND <Wd|WSP>, <Wn>, #<imms> \\ ALU, basic \\ 1 1 1 4.0 V1UnitI
+ and X2, X22, #0x1e00 // AND <Xd|SP>, <Xn>, #<imm> \\ ALU, basic \\ 1 1 1 4.0 V1UnitI
+ and Z1.B, Z1.B, #0x70 // AND <Zdn>.B, <Zdn>.B, #<constb> \\ Logical \\ 1 2 2 2.0 V1UnitV01
+ and Z7.H, Z7.H, #0x60 // AND <Zdn>.H, <Zdn>.H, #<consth> \\ Logical \\ 1 2 2 2.0 V1UnitV01
+ and Z7.S, Z7.S, #0x2 // AND <Zdn>.S, <Zdn>.S, #<consts> \\ Logical \\ 1 2 2 2.0 V1UnitV01
+ and Z7.D, Z7.D, #0x4 // AND <Zdn>.D, <Zdn>.D, #<constd> \\ Logical \\ 1 2 2 2.0 V1UnitV01
+ and P5.B, P1/Z, P6.B, P4.B // AND <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B \\ Predicate logical \\ 1 1 1 1.0 V1UnitM0
+ and W11, W14, W24 // AND <Wd>, <Wn>, <Wm> \\ Logical, shift, no flagset \\ 1 1 1 4.0 V1UnitI
+ and W2, W21, W22, LSR #25 // AND <Wd>, <Wn>, <Wm>, <shift> #<wamount> \\ Logical, shift, no flagset \\ 1 1 1 4.0 V1UnitI
+ and X1, X20, X29 // AND <Xd>, <Xn>, <Xm> \\ Logical, shift, no flagset \\ 1 1 1 4.0 V1UnitI
+ and X8, X11, X22, ASR #56 // AND <Xd>, <Xn>, <Xm>, <shift> #<amount> \\ Logical, shift, no flagset \\ 1 1 1 4.0 V1UnitI
+ and V29.8B, V26.8B, V26.8B // AND <Vd>.<T>, <Vn>.<T>, <Vm>.<T> \\ ASIMD logical \\ 1 2 2 4.0 V1UnitV
+ and Z17.D, P6/M, Z17.D, Z12.D // AND <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T> \\ Logical \\ 1 2 2 2.0 V1UnitV01
+ and Z9.D, Z5.D, Z17.D // AND <Zd>.D, <Zn>.D, <Zm>.D \\ Logical \\ 1 2 2 2.0 V1UnitV01
+ ands W14, W8, #0x70 // ANDS <Wd>, <Wn>, #<imms> \\ ALU, basic, unconditional, flagset \\ 1 1 1 3.00 V1UnitI,V1UnitFlg
+ ands X4, X10, #0x60 // ANDS <Xd>, <Xn>, #<immd> \\ ALU, basic, flagset \\ 1 1 1 3.00 V1UnitI,V1UnitFlg
+ ands W29, W28, W12 // ANDS <Wd>, <Wn>, <Wm> \\ ALU, basic, unconditional, flagset \\ 1 2 2 2.00 V1UnitI,V1UnitFlg
+ ands W7, W13, W23, ASR #3 // ANDS <Wd>, <Wn>, <Wm>, <shift> #<wamount> \\ Logical, shift by immed, flagset, unconditional \\ 1 2 2 2.00 V1UnitM,V1UnitFlg
+ ands X21, X9, X6 // ANDS <Xd>, <Xn>, <Xm> \\ ALU, basic, flagset \\ 1 2 2 2.00 V1UnitI,V1UnitFlg
+ ands X10, X27, X7, ASR #20 // ANDS <Xd>, <Xn>, <Xm>, <shift> #<amount> \\ Logical, shift, flagset \\ 1 2 2 2.00 V1UnitM,V1UnitFlg
+ ands P5.B, P1/Z, P2.B, P7.B // ANDS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B \\ Predicate logical, flag setting \\ 2 2 2 0.50 V1UnitM0[2]
+ andv H7, P6, Z31.H // ANDV <V><d>, <Pg>, <Zn>.<T> \\ Reduction, logical \\ 4 12 12 0.50 V1UnitV01[4]
+ asr W30, W14, #5 // ASR <Wd>, <Wn>, #<shifts> \\ Move, shift by immed, no flagset \\ 1 1 1 4.0 V1UnitI
+ asr X12, X21, #28 // ASR <Xd>, <Xn>, #<shiftd> \\ Move, shift by immed, no flagset \\ 1 1 1 4.0 V1UnitI
+ asr Z7.B, P5/M, Z7.B, #3 // ASR <Zdn>.B, <Pg>/M, <Zdn>.B, #<constb> \\ Arithmetic, shift \\ 1 2 2 1.0 V1UnitV1
+ asr Z6.H, P6/M, Z6.H, #5 // ASR <Zdn>.H, <Pg>/M, <Zdn>.H, #<consth> \\ Arithmetic, shift \\ 1 2 2 1.0 V1UnitV1
+ asr Z28.S, P0/M, Z28.S, #11 // ASR <Zdn>.S, <Pg>/M, <Zdn>.S, #<consts> \\ Arithmetic, shift \\ 1 2 2 1.0 V1UnitV1
+ asr Z26.D, P5/M, Z26.D, #24 // ASR <Zdn>.D, <Pg>/M, <Zdn>.D, #<constd> \\ Arithmetic, shift \\ 1 2 2 1.0 V1UnitV1
+ asr Z10.B, Z14.B, #3 // ASR <Zd>.B, <Zn>.B, #<constb> \\ Arithmetic, shift \\ 1 2 2 1.0 V1UnitV1
+ asr Z23.H, Z18.H, #6 // ASR <Zd>.H, <Zn>.H, #<consth> \\ Arithmetic, shift \\ 1 2 2 1.0 V1UnitV1
+ asr Z29.S, Z11.S, #6 // ASR <Zd>.S, <Zn>.S, #<consts> \\ Arithmetic, shift \\ 1 2 2 1.0 V1UnitV1
+ asr Z20.D, Z26.D, #29 // ASR <Zd>.D, <Zn>.D, #<constd> \\ Arithmetic, shift \\ 1 2 2 1.0 V1UnitV1
+ asr W3, W0, W20 // ASR <Wd>, <Wn>, <Wm> \\ Move, shift by register, no flagset, unconditional \\ 1 1 1 4.0 V1UnitI
+ asr X7, X5, X21 // ASR <Xd>, <Xn>, <Xm> \\ Move, shift by register, no flagset, unconditional \\ 1 1 1 4.0 V1UnitI
+ asr Z3.S, P0/M, Z3.S, Z10.S // ASR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T> \\ Arithmetic, shift \\ 1 2 2 1.0 V1UnitV1
+ asr Z9.S, P2/M, Z9.S, Z8.D // ASR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.D \\ Arithmetic, shift \\ 1 2 2 1.0 V1UnitV1
+ asr Z26.S, Z21.S, Z21.D // ASR <Zd>.<T>, <Zn>.<T>, <Zm>.D \\ Arithmetic, shift \\ 1 2 2 1.0 V1UnitV1
+ asrd Z6.B, P4/M, Z6.B, #2 // ASRD <Zdn>.B, <Pg>/M, <Zdn>.B, #<constb> \\ Arithmetic, shift right for divide \\ 1 4 4 1.0 V1UnitV1
+ asrd Z19.H, P3/M, Z19.H, #6 // ASRD <Zdn>.H, <Pg>/M, <Zdn>.H, #<consth> \\ Arithmetic, shift right for divide \\ 1 4 4 1.0 V1UnitV1
+ asrd Z16.S, P3/M, Z16.S, #2 // ASRD <Zdn>.S, <Pg>/M, <Zdn>.S, #<consts> \\ Arithmetic, shift right for divide \\ 1 4 4 1.0 V1UnitV1
+ asrd Z9.D, P6/M, Z9.D, #12 // ASRD <Zdn>.D, <Pg>/M, <Zdn>.D, #<constd> \\ Arithmetic, shift right for divide \\ 1 4 4 1.0 V1UnitV1
+ asrr Z0.B, P0/M, Z0.B, Z19.B // ASRR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T> \\ Arithmetic, shift \\ 1 2 2 1.0 V1UnitV1
+ asrv W24, W28, W13 // ASRV <Wd>, <Wn>, <Wm> \\ Variable shift \\ 1 1 1 4.0 V1UnitI
+ asrv X3, X21, X24 // ASRV <Xd>, <Xn>, <Xm> \\ Variable shift \\ 1 1 1 4.0 V1UnitI
+ at s12e1r, X28 // AT <at_op>, <Xt> \\ No description \\ No scheduling info
+ b test // B <label> \\ Branch, immed \\ 1 1 1 2.0 V1UnitB
+ b.eq test // B.eq <label> \\ Branch, immed \\ 1 1 1 2.0 V1UnitB
+ b.none test // B.none <label> \\ Branch, immed \\ 1 1 1 2.0 V1UnitB
+ b.ne test // B.ne <label> \\ Branch, immed \\ 1 1 1 2.0 V1UnitB
+ b.any test // B.any <label> \\ Branch, immed \\ 1 1 1 2.0 V1UnitB
+ b.cs test // B.cs <label> \\ Branch, immed \\ 1 1 1 2.0 V1UnitB
+ b.hs test // B.hs <label> \\ Branch, immed \\ 1 1 1 2.0 V1UnitB
+ b.nlast test // B.nlast <label> \\ Branch, immed \\ 1 1 1 2.0 V1UnitB
+ b.cc test // B.cc <label> \\ Branch, immed \\ 1 1 1 2.0 V1UnitB
+ b.lo test // B.lo <label> \\ Branch, immed \\ 1 1 1 2.0 V1UnitB
+ b.last test // B.last <label> \\ Branch, immed \\ 1 1 1 2.0 V1UnitB
+ b.mi test // B.mi <label> \\ Branch, immed \\ 1 1 1 2.0 V1UnitB
+ b.first test // B.first <label> \\ Branch, immed \\ 1 1 1 2.0 V1UnitB
+ b.pl test // B.pl <label> \\ Branch, immed \\ 1 1 1 2.0 V1UnitB
+ b.nfrst test // B.nfrst <label> \\ Branch, immed \\ 1 1 1 2.0 V1UnitB
+ b.vs test // B.vs <label> \\ Branch, immed \\ 1 1 1 2.0 V1UnitB
+ b.vc test // B.vc <label> \\ Branch, immed \\ 1 1 1 2.0 V1UnitB
+ b.hi test // B.hi <label> \\ Branch, immed \\ 1 1 1 2.0 V1UnitB
+ b.pmore test // B.pmore <label> \\ Branch, immed \\ 1 1 1 2.0 V1UnitB
+ b.ls test // B.ls <label> \\ Branch, immed \\ 1 1 1 2.0 V1UnitB
+ b.plast test // B.plast <label> \\ Branch, immed \\ 1 1 1 2.0 V1UnitB
+ b.ge test // B.ge <label> \\ Branch, immed \\ 1 1 1 2.0 V1UnitB
+ b.tcont test // B.tcont <label> \\ Branch, immed \\ 1 1 1 2.0 V1UnitB
+ b.lt test // B.lt <label> \\ Branch, immed \\ 1 1 1 2.0 V1UnitB
+ b.tstop test // B.tstop <label> \\ Branch, immed \\ 1 1 1 2.0 V1UnitB
+ b.gt test // B.gt <label> \\ Branch, immed \\ 1 1 1 2.0 V1UnitB
+ b.le test // B.le <label> \\ Branch, immed \\ 1 1 1 2.0 V1UnitB
+ b.al test // B.al <label> \\ Branch, immed \\ 1 1 1 2.0 V1UnitB
+ b.nv test // B.nv <label> \\ Branch, immed \\ 1 1 1 2.0 V1UnitB
+ bfcvt H6, S20 // BFCVT <Hd>, <Sn> \\ Scalar convert, F32 to BF16 \\ 1 3 3 2.0 V1UnitV02
+ bfcvt Z16.H, P6/M,...
[truncated]
``````````
</details>
https://github.com/llvm/llvm-project/pull/126703
More information about the llvm-commits
mailing list