[lldb] [llvm] [lldb] Support disassembling RISC-V proprietary instructions (PR #145793)

Thu Jul 3 12:53:00 PDT 2025

https://github.com/tedwoodward updated https://github.com/llvm/llvm-project/pull/145793

>From 1a7ee4297bb8e6b3fa08818e05cf245a2c768c2b Mon Sep 17 00:00:00 2001
From: Ted Woodward <tedwood at quicinc.com>
Date: Wed, 25 Jun 2025 14:22:28 -0700
Subject: [PATCH 1/2] Support disassembling RISC-V proprietary insns

RISC-V supports proprietary extensions, where the TD files don't know
about certain instructions, and the disassembler can't disassemble them.
Internal users want to be able to disassemble these instructions.

With llvm-objdump, the solution is to pipe the output of the disassembly
through a filter program. This patch modifies LLDB's disassembly to look
more like llvm-objdump's, and includes an example python script that adds
a command "fdis" that will disassemble, then pipe the output through a
specified filter program. This has been tested with crustfilt, a sample
filter located at https://github.com/quic/crustfilt .

Changes in this PR:
- Decouple "can't disassemble" with "instruction size".
  DisassemblerLLVMC::MCDisasmInstance::GetMCInst now returns a bool for
    valid disassembly, and has the size as an out paramter.
  Use the size even if the disassembly is invalid.
  Disassemble if disassemby is valid.

- Always print out the opcode when -b is specified.
  Previously it wouldn't print out the opcode if it couldn't disassemble.

- Print out RISC-V opcodes the way llvm-objdump does.
  Add DumpRISCV method based on RISC-V pretty printer in llvm-objdump.cpp.

- Print <unknown> for instructions that can't be disassembled, matching
  llvm-objdump, instead of printing nothing.

- Update max riscv32 and riscv64 instruction size to 8.

- Add example "fdis" command script.

Change-Id: Ie5a359d9e87a12dde79a8b5c9c7a146440a550c5
---
 lldb/examples/python/filter_disasm.py         | 87 +++++++++++++++++++
 lldb/include/lldb/Core/Opcode.h               |  1 +
 lldb/source/Core/Disassembler.cpp             | 14 ++-
 lldb/source/Core/Opcode.cpp                   | 38 ++++++++
 .../Disassembler/LLVMC/DisassemblerLLVMC.cpp  | 39 +++++----
 lldb/source/Utility/ArchSpec.cpp              |  4 +-
 6 files changed, 160 insertions(+), 23 deletions(-)
 create mode 100644 lldb/examples/python/filter_disasm.py

diff --git a/lldb/examples/python/filter_disasm.py b/lldb/examples/python/filter_disasm.py
new file mode 100644
index 0000000000000..adb3455209055
--- /dev/null
+++ b/lldb/examples/python/filter_disasm.py
@@ -0,0 +1,87 @@
+"""
+Defines a command, fdis, that does filtered disassembly. The command does the
+lldb disassemble command with -b and any other arguments passed in, and
+pipes that through a provided filter program.
+
+The intention is to support disassembly of RISC-V proprietary instructions.
+This is handled with llvm-objdump by piping the output of llvm-objdump through
+a filter program. This script is intended to mimic that workflow.
+"""
+
+import lldb
+import subprocess
+
+filter_program = "crustfilt"
+
+def __lldb_init_module(debugger, dict):
+    debugger.HandleCommand(
+        'command script add -f filter_disasm.fdis fdis')
+    print("Disassembly filter command (fdis) loaded")
+    print("Filter program set to %s" % filter_program)
+
+
+def fdis(debugger, args, result, dict):
+    """
+  Call the built in disassembler, then pass its output to a filter program
+  to add in disassembly for hidden opcodes.
+  Except for get and set, use the fdis command like the disassemble command.
+  By default, the filter program is crustfilt, from
+  https://github.com/quic/crustfilt . This can be changed by changing
+  the global variable filter_program.
+
+  Usage:
+    fdis [[get] [set <program>] [<disassembly options>]]
+
+    Choose one of the following:
+        get
+            Gets the current filter program
+
+        set <program>
+            Sets the current filter program. This can be an executable, which
+            will be found on PATH, or an absolute path.
+
+        <disassembly options>
+            If the first argument is not get or set, the args will be passed
+            to the disassemble command as is.
+
+    """
+
+    global filter_program
+    args_list = args.split(' ')
+    result.Clear()
+
+    if len(args_list) == 1 and args_list[0] == 'get':
+        result.PutCString(filter_program)
+        result.SetStatus(lldb.eReturnStatusSuccessFinishResult)
+        return
+
+    if len(args_list) == 2 and args_list[0] == 'set':
+        filter_program = args_list[1]
+        result.PutCString("Filter program set to %s" % filter_program)
+        result.SetStatus(lldb.eReturnStatusSuccessFinishResult)
+        return
+
+    res = lldb.SBCommandReturnObject()
+    debugger.GetCommandInterpreter().HandleCommand('disassemble -b ' + args, res)
+    if (len(res.GetError()) > 0):
+        result.SetError(res.GetError())
+        result.SetStatus(lldb.eReturnStatusFailed)
+        return
+    output = res.GetOutput()
+
+    try:
+        proc = subprocess.run([filter_program], capture_output=True, text=True, input=output)
+    except (subprocess.SubprocessError, OSError) as e:
+        result.PutCString("Error occurred. Original disassembly:\n\n" + output)
+        result.SetError(str(e))
+        result.SetStatus(lldb.eReturnStatusFailed)
+        return
+
+    print(proc.stderr)
+    if proc.stderr:
+        pass
+        #result.SetError(proc.stderr)
+        #result.SetStatus(lldb.eReturnStatusFailed)
+    else:
+        result.PutCString(proc.stdout)
+        result.SetStatus(lldb.eReturnStatusSuccessFinishResult)
diff --git a/lldb/include/lldb/Core/Opcode.h b/lldb/include/lldb/Core/Opcode.h
index f72f2687b54fe..88ef17093d3f3 100644
--- a/lldb/include/lldb/Core/Opcode.h
+++ b/lldb/include/lldb/Core/Opcode.h
@@ -200,6 +200,7 @@ class Opcode {
   }
 
   int Dump(Stream *s, uint32_t min_byte_width);
+  int DumpRISCV(Stream *s, uint32_t min_byte_width);
 
   const void *GetOpcodeBytes() const {
     return ((m_type == Opcode::eTypeBytes) ? m_data.inst.bytes : nullptr);
diff --git a/lldb/source/Core/Disassembler.cpp b/lldb/source/Core/Disassembler.cpp
index 833e327579a29..f95e446448036 100644
--- a/lldb/source/Core/Disassembler.cpp
+++ b/lldb/source/Core/Disassembler.cpp
@@ -658,8 +658,13 @@ void Instruction::Dump(lldb_private::Stream *s, uint32_t max_opcode_byte_size,
       // the byte dump to be able to always show 15 bytes (3 chars each) plus a
       // space
       if (max_opcode_byte_size > 0)
-        m_opcode.Dump(&ss, max_opcode_byte_size * 3 + 1);
-      else
+        // make RISC-V opcode dump look like llvm-objdump
+        if (exe_ctx &&
+            exe_ctx->GetTargetSP()->GetArchitecture().GetTriple().isRISCV())
+          m_opcode.DumpRISCV(&ss, max_opcode_byte_size * 3 + 1);
+        else
+          m_opcode.Dump(&ss, max_opcode_byte_size * 3 + 1);
+       else
         m_opcode.Dump(&ss, 15 * 3 + 1);
     } else {
       // Else, we have ARM or MIPS which can show up to a uint32_t 0x00000000
@@ -685,10 +690,13 @@ void Instruction::Dump(lldb_private::Stream *s, uint32_t max_opcode_byte_size,
     }
   }
   const size_t opcode_pos = ss.GetSizeOfLastLine();
-  const std::string &opcode_name =
+  std::string &opcode_name =
       show_color ? m_markup_opcode_name : m_opcode_name;
   const std::string &mnemonics = show_color ? m_markup_mnemonics : m_mnemonics;
 
+  if (opcode_name.empty())
+    opcode_name = "<unknown>";
+
   // The default opcode size of 7 characters is plenty for most architectures
   // but some like arm can pull out the occasional vqrshrun.s16.  We won't get
   // consistent column spacing in these cases, unfortunately. Also note that we
diff --git a/lldb/source/Core/Opcode.cpp b/lldb/source/Core/Opcode.cpp
index 3e30d98975d8a..dbcd18cc0d8d2 100644
--- a/lldb/source/Core/Opcode.cpp
+++ b/lldb/source/Core/Opcode.cpp
@@ -78,6 +78,44 @@ lldb::ByteOrder Opcode::GetDataByteOrder() const {
   return eByteOrderInvalid;
 }
 
+// make RISC-V byte dumps look like llvm-objdump, instead of just dumping bytes
+int Opcode::DumpRISCV(Stream *s, uint32_t min_byte_width) {
+  const uint32_t previous_bytes = s->GetWrittenBytes();
+  // if m_type is not bytes, call Dump
+  if (m_type != Opcode::eTypeBytes)
+    return Dump(s, min_byte_width);
+
+  // from RISCVPrettyPrinter in llvm-objdump.cpp
+  // if size % 4 == 0, print as 1 or 2 32 bit values (32 or 64 bit inst)
+  // else if size % 2 == 0, print as 1 or 3 16 bit values (16 or 48 bit inst)
+  // else fall back and print bytes
+  for (uint32_t i = 0; i < m_data.inst.length;) {
+    if (i > 0)
+      s->PutChar(' ');
+    if (!(m_data.inst.length % 4)) {
+      s->Printf("%2.2x%2.2x%2.2x%2.2x", m_data.inst.bytes[i + 3],
+                                        m_data.inst.bytes[i + 2],
+                                        m_data.inst.bytes[i + 1],
+                                        m_data.inst.bytes[i + 0]);
+      i += 4;
+    } else if (!(m_data.inst.length % 2)) {
+      s->Printf("%2.2x%2.2x", m_data.inst.bytes[i + 1],
+                              m_data.inst.bytes[i + 0]);
+      i += 2;
+    } else {
+      s->Printf("%2.2x", m_data.inst.bytes[i]);
+      ++i;
+    }
+  }
+
+  uint32_t bytes_written_so_far = s->GetWrittenBytes() - previous_bytes;
+  // Add spaces to make sure bytes display comes out even in case opcodes aren't
+  // all the same size.
+  if (bytes_written_so_far < min_byte_width)
+    s->Printf("%*s", min_byte_width - bytes_written_so_far, "");
+  return s->GetWrittenBytes() - previous_bytes;
+}
+
 uint32_t Opcode::GetData(DataExtractor &data) const {
   uint32_t byte_size = GetByteSize();
   uint8_t swap_buf[8];
diff --git a/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp b/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp
index ed6047f8f4ef3..eeb6020abd73a 100644
--- a/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp
+++ b/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp
@@ -61,6 +61,8 @@ class DisassemblerLLVMC::MCDisasmInstance {
 
   uint64_t GetMCInst(const uint8_t *opcode_data, size_t opcode_data_len,
                      lldb::addr_t pc, llvm::MCInst &mc_inst) const;
+  bool GetMCInst(const uint8_t *opcode_data, size_t opcode_data_len,
+                 lldb::addr_t pc, llvm::MCInst &mc_inst, size_t &size) const;
   void PrintMCInst(llvm::MCInst &mc_inst, lldb::addr_t pc,
                    std::string &inst_string, std::string &comments_string);
   void SetStyle(bool use_hex_immed, HexImmediateStyle hex_style);
@@ -524,11 +526,11 @@ class InstructionLLVMC : public lldb_private::Instruction {
           const addr_t pc = m_address.GetFileAddress();
           llvm::MCInst inst;
 
-          const size_t inst_size =
-              mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc, inst);
-          if (inst_size == 0)
-            m_opcode.Clear();
-          else {
+          size_t inst_size = 0;
+          m_is_valid = mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len,
+                                                pc, inst, inst_size);
+          m_opcode.Clear();
+          if (inst_size != 0) {
             m_opcode.SetOpcodeBytes(opcode_data, inst_size);
             m_is_valid = true;
           }
@@ -604,10 +606,11 @@ class InstructionLLVMC : public lldb_private::Instruction {
         const uint8_t *opcode_data = data.GetDataStart();
         const size_t opcode_data_len = data.GetByteSize();
         llvm::MCInst inst;
-        size_t inst_size =
-            mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc, inst);
-
-        if (inst_size > 0) {
+        size_t inst_size = 0;
+        bool valid = mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc,
+                                             inst, inst_size);
+ 
+        if (valid && inst_size > 0) {
           mc_disasm_ptr->SetStyle(use_hex_immediates, hex_style);
 
           const bool saved_use_color = mc_disasm_ptr->GetUseColor();
@@ -1206,9 +1209,10 @@ class InstructionLLVMC : public lldb_private::Instruction {
     const uint8_t *opcode_data = data.GetDataStart();
     const size_t opcode_data_len = data.GetByteSize();
     llvm::MCInst inst;
-    const size_t inst_size =
-        mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc, inst);
-    if (inst_size == 0)
+    size_t inst_size = 0;
+    const bool valid = mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len,
+                                                pc, inst, inst_size);
+    if (!valid)
       return;
 
     m_has_visited_instruction = true;
@@ -1337,19 +1341,18 @@ DisassemblerLLVMC::MCDisasmInstance::MCDisasmInstance(
          m_asm_info_up && m_context_up && m_disasm_up && m_instr_printer_up);
 }
 
-uint64_t DisassemblerLLVMC::MCDisasmInstance::GetMCInst(
+bool DisassemblerLLVMC::MCDisasmInstance::GetMCInst(
     const uint8_t *opcode_data, size_t opcode_data_len, lldb::addr_t pc,
-    llvm::MCInst &mc_inst) const {
+    llvm::MCInst &mc_inst, size_t &size) const {
   llvm::ArrayRef<uint8_t> data(opcode_data, opcode_data_len);
   llvm::MCDisassembler::DecodeStatus status;
 
-  uint64_t new_inst_size;
-  status = m_disasm_up->getInstruction(mc_inst, new_inst_size, data, pc,
+  status = m_disasm_up->getInstruction(mc_inst, size, data, pc,
                                        llvm::nulls());
   if (status == llvm::MCDisassembler::Success)
-    return new_inst_size;
+    return true;
   else
-    return 0;
+    return false;
 }
 
 void DisassemblerLLVMC::MCDisasmInstance::PrintMCInst(
diff --git a/lldb/source/Utility/ArchSpec.cpp b/lldb/source/Utility/ArchSpec.cpp
index 70b9800f4dade..7c71aaae6bcf2 100644
--- a/lldb/source/Utility/ArchSpec.cpp
+++ b/lldb/source/Utility/ArchSpec.cpp
@@ -228,9 +228,9 @@ static const CoreDefinition g_core_definitions[] = {
     {eByteOrderLittle, 4, 4, 4, llvm::Triple::hexagon,
      ArchSpec::eCore_hexagon_hexagonv5, "hexagonv5"},
 
-    {eByteOrderLittle, 4, 2, 4, llvm::Triple::riscv32, ArchSpec::eCore_riscv32,
+    {eByteOrderLittle, 4, 2, 8, llvm::Triple::riscv32, ArchSpec::eCore_riscv32,
      "riscv32"},
-    {eByteOrderLittle, 8, 2, 4, llvm::Triple::riscv64, ArchSpec::eCore_riscv64,
+    {eByteOrderLittle, 8, 2, 8, llvm::Triple::riscv64, ArchSpec::eCore_riscv64,
      "riscv64"},
 
     {eByteOrderLittle, 4, 4, 4, llvm::Triple::loongarch32,

>From eee204836005d45adc5ffdd41fe4d61db585006a Mon Sep 17 00:00:00 2001
From: Ted Woodward <tedwood at quicinc.com>
Date: Wed, 25 Jun 2025 14:22:28 -0700
Subject: [PATCH 2/2] [lldb] Improve disassembly of unknown instructions

LLDB uses the LLVM disassembler to determine the size of instructions and
to do the actual disassembly. Currently, if the LLVM disassembler can't
disassemble an instruction, LLDB will ignore the instruction size, assume
the instruction size is the minimum size for that device, print no useful
opcode, and print nothing for the instruction.

This patch changes this behavior to separate the instruction size and
"can't disassemble". If the LLVM disassembler knows the size, but can't
dissasemble the instruction, LLDB will use that size. It will print out
the opcode, and will print "<unknown>" for the instruction. This is much
more useful to both a user and a script.

The impetus behind this change is to clean up RISC-V disassembly when
the LLVM disassembler doesn't understand all of the instructions.
RISC-V supports proprietary extensions, where the TD files don't know
about certain instructions, and the disassembler can't disassemble them.
Internal users want to be able to disassemble these instructions.

With llvm-objdump, the solution is to pipe the output of the disassembly
through a filter program. This patch modifies LLDB's disassembly to look
more like llvm-objdump's, and includes an example python script that adds
a command "fdis" that will disassemble, then pipe the output through a
specified filter program. This has been tested with crustfilt, a sample
filter located at https://github.com/quic/crustfilt .

Changes in this PR:
- Decouple "can't disassemble" with "instruction size".
  DisassemblerLLVMC::MCDisasmInstance::GetMCInst now returns a bool for
    valid disassembly, and has the size as an out paramter.
  Use the size even if the disassembly is invalid.
  Disassemble if disassemby is valid.

- Always print out the opcode when -b is specified.
  Previously it wouldn't print out the opcode if it couldn't disassemble.

- Print out RISC-V opcodes the way llvm-objdump does.
  Add DumpRISCV method based on RISC-V pretty printer in llvm-objdump.cpp.

- Print <unknown> for instructions that can't be disassembled, matching
  llvm-objdump, instead of printing nothing.

- Update max riscv32 and riscv64 instruction size to 8.

- Add example "fdis" command script.

Change-Id: Ie5a359d9e87a12dde79a8b5c9c7a146440a550c5
---
 lldb/examples/python/filter_disasm.py         |  4 +--
 lldb/source/Core/Disassembler.cpp             |  7 +++--
 lldb/source/Core/Opcode.cpp                   |  8 ++---
 lldb/test/Shell/Commands/Inputs/dis_filt.sh   |  5 ++++
 .../command-disassemble-riscv32-bytes.s       | 30 +++++++++++++++++++
 .../Commands/command-disassemble-x86-bytes.s  | 28 +++++++++++++++++
 llvm/docs/ReleaseNotes.md                     |  3 ++
 7 files changed, 76 insertions(+), 9 deletions(-)
 create mode 100755 lldb/test/Shell/Commands/Inputs/dis_filt.sh
 create mode 100644 lldb/test/Shell/Commands/command-disassemble-riscv32-bytes.s
 create mode 100644 lldb/test/Shell/Commands/command-disassemble-x86-bytes.s

diff --git a/lldb/examples/python/filter_disasm.py b/lldb/examples/python/filter_disasm.py
index adb3455209055..d0ce609a99dd7 100644
--- a/lldb/examples/python/filter_disasm.py
+++ b/lldb/examples/python/filter_disasm.py
@@ -20,7 +20,7 @@ def __lldb_init_module(debugger, dict):
     print("Filter program set to %s" % filter_program)
 
 
-def fdis(debugger, args, result, dict):
+def fdis(debugger, args, exe_ctx, result, dict):
     """
   Call the built in disassembler, then pass its output to a filter program
   to add in disassembly for hidden opcodes.
@@ -62,7 +62,7 @@ def fdis(debugger, args, result, dict):
         return
 
     res = lldb.SBCommandReturnObject()
-    debugger.GetCommandInterpreter().HandleCommand('disassemble -b ' + args, res)
+    debugger.GetCommandInterpreter().HandleCommand('disassemble -b ' + args, exe_ctx, res)
     if (len(res.GetError()) > 0):
         result.SetError(res.GetError())
         result.SetStatus(lldb.eReturnStatusFailed)
diff --git a/lldb/source/Core/Disassembler.cpp b/lldb/source/Core/Disassembler.cpp
index f95e446448036..5ee3fc628478e 100644
--- a/lldb/source/Core/Disassembler.cpp
+++ b/lldb/source/Core/Disassembler.cpp
@@ -653,6 +653,7 @@ void Instruction::Dump(lldb_private::Stream *s, uint32_t max_opcode_byte_size,
   }
 
   if (show_bytes) {
+    auto max_byte_width = max_opcode_byte_size * 3 + 1;
     if (m_opcode.GetType() == Opcode::eTypeBytes) {
       // x86_64 and i386 are the only ones that use bytes right now so pad out
       // the byte dump to be able to always show 15 bytes (3 chars each) plus a
@@ -661,16 +662,16 @@ void Instruction::Dump(lldb_private::Stream *s, uint32_t max_opcode_byte_size,
         // make RISC-V opcode dump look like llvm-objdump
         if (exe_ctx &&
             exe_ctx->GetTargetSP()->GetArchitecture().GetTriple().isRISCV())
-          m_opcode.DumpRISCV(&ss, max_opcode_byte_size * 3 + 1);
+          m_opcode.DumpRISCV(&ss, max_byte_width);
         else
-          m_opcode.Dump(&ss, max_opcode_byte_size * 3 + 1);
+          m_opcode.Dump(&ss, max_byte_width);
        else
         m_opcode.Dump(&ss, 15 * 3 + 1);
     } else {
       // Else, we have ARM or MIPS which can show up to a uint32_t 0x00000000
       // (10 spaces) plus two for padding...
       if (max_opcode_byte_size > 0)
-        m_opcode.Dump(&ss, max_opcode_byte_size * 3 + 1);
+        m_opcode.Dump(&ss, max_byte_width);
       else
         m_opcode.Dump(&ss, 12);
     }
diff --git a/lldb/source/Core/Opcode.cpp b/lldb/source/Core/Opcode.cpp
index dbcd18cc0d8d2..17b4f2d30e6c4 100644
--- a/lldb/source/Core/Opcode.cpp
+++ b/lldb/source/Core/Opcode.cpp
@@ -85,23 +85,23 @@ int Opcode::DumpRISCV(Stream *s, uint32_t min_byte_width) {
   if (m_type != Opcode::eTypeBytes)
     return Dump(s, min_byte_width);
 
-  // from RISCVPrettyPrinter in llvm-objdump.cpp
-  // if size % 4 == 0, print as 1 or 2 32 bit values (32 or 64 bit inst)
-  // else if size % 2 == 0, print as 1 or 3 16 bit values (16 or 48 bit inst)
-  // else fall back and print bytes
+  // Logic taken from from RISCVPrettyPrinter in llvm-objdump.cpp
   for (uint32_t i = 0; i < m_data.inst.length;) {
     if (i > 0)
       s->PutChar(' ');
+    // if size % 4 == 0, print as 1 or 2 32 bit values (32 or 64 bit inst)
     if (!(m_data.inst.length % 4)) {
       s->Printf("%2.2x%2.2x%2.2x%2.2x", m_data.inst.bytes[i + 3],
                                         m_data.inst.bytes[i + 2],
                                         m_data.inst.bytes[i + 1],
                                         m_data.inst.bytes[i + 0]);
       i += 4;
+    // else if size % 2 == 0, print as 1 or 3 16 bit values (16 or 48 bit inst)
     } else if (!(m_data.inst.length % 2)) {
       s->Printf("%2.2x%2.2x", m_data.inst.bytes[i + 1],
                               m_data.inst.bytes[i + 0]);
       i += 2;
+    // else fall back and print bytes
     } else {
       s->Printf("%2.2x", m_data.inst.bytes[i]);
       ++i;
diff --git a/lldb/test/Shell/Commands/Inputs/dis_filt.sh b/lldb/test/Shell/Commands/Inputs/dis_filt.sh
new file mode 100755
index 0000000000000..5fb4e9386461f
--- /dev/null
+++ b/lldb/test/Shell/Commands/Inputs/dis_filt.sh
@@ -0,0 +1,5 @@
+#! /bin/sh
+
+echo "Fake filter start"
+cat
+echo "Fake filter end"
diff --git a/lldb/test/Shell/Commands/command-disassemble-riscv32-bytes.s b/lldb/test/Shell/Commands/command-disassemble-riscv32-bytes.s
new file mode 100644
index 0000000000000..28848b6f458f6
--- /dev/null
+++ b/lldb/test/Shell/Commands/command-disassemble-riscv32-bytes.s
@@ -0,0 +1,30 @@
+# REQUIRES: riscv
+
+# This test verifies that disassemble -b prints out the correct bytes and
+# format for standard and unknown riscv instructions of various sizes,
+# and that unknown instructions show opcodes and disassemble as "<unknown>".
+# It also tests that the fdis command from examples/python/filter_disasm.py
+# pipes the disassembly output through a simple filter program correctly.
+
+
+# RUN: llvm-mc -filetype=obj -mattr=+c --triple=riscv32-unknown-unknown %s -o %t
+# RUN: %lldb -b %t -o "command script import %S/../../../examples/python/filter_disasm.py" -o "fdis set %S/Inputs/dis_filt.sh" -o "fdis -n main" | FileCheck %s
+
+main:
+    addi   sp, sp, -0x20               # 16 bit standard instruction
+    sw     a0, -0xc(s0)                # 32 bit standard instruction
+    .insn 8, 0x2000200940003F;         # 64 bit custom instruction
+    .insn 6, 0x021F | 0x00001000 << 32 # 48 bit xqci.e.li rd=8 imm=0x1000
+    .insn 4, 0x84F940B                 # 32 bit xqci.insbi  
+    .insn 2, 0xB8F2                    # 16 bit cm.push
+
+# CHECK: Disassembly filter command (fdis) loaded
+# CHECK: Fake filter start
+# CHECK: [0x0] <+0>:   1101                     addi   sp, sp, -0x20 
+# CHECK: [0x2] <+2>:   fea42a23                 sw     a0, -0xc(s0)
+# CHECK: [0x6] <+6>:   0940003f 00200020        <unknown>
+# CHECK: [0xe] <+14>:  021f 0000 1000           <unknown>
+# CHECK: [0x14] <+20>: 084f940b                 <unknown>
+# CHECK: [0x18] <+24>: b8f2                     <unknown>
+# CHECK: Fake filter end
+
diff --git a/lldb/test/Shell/Commands/command-disassemble-x86-bytes.s b/lldb/test/Shell/Commands/command-disassemble-x86-bytes.s
new file mode 100644
index 0000000000000..c2e98a60316e2
--- /dev/null
+++ b/lldb/test/Shell/Commands/command-disassemble-x86-bytes.s
@@ -0,0 +1,28 @@
+# REQUIRES: x86
+
+# This test verifies that disassemble -b prints out the correct bytes and
+# format for x86_64 instructions of various sizes, and that an unknown
+# instruction shows the opcode and disassembles as "<unknown>"
+
+# RUN: llvm-mc -filetype=obj --triple=x86_64-unknown-unknown %s -o %t
+# RUN: %lldb -b %t -o "disassemble -b -n main" | FileCheck %s
+
+main:                                   # @main
+	subq   $0x18, %rsp
+	movl   $0x0, 0x14(%rsp)
+	movq   %rdx, 0x8(%rsp)
+	movl   %ecx, 0x4(%rsp)
+	movl   (%rsp), %eax
+        addq   $0x18, %rsp
+	retq
+        .byte  0x6 
+
+# CHECK: [0x0] <+0>:   48 83 ec 18              subq   $0x18, %rsp
+# CHECK: [0x4] <+4>:   c7 44 24 14 00 00 00 00  movl   $0x0, 0x14(%rsp)
+# CHECK: [0xc] <+12>:  48 89 54 24 08           movq   %rdx, 0x8(%rsp)
+# CHECK: [0x11] <+17>: 89 4c 24 04              movl   %ecx, 0x4(%rsp)
+# CHECK: [0x15] <+21>: 8b 04 24                 movl   (%rsp), %eax
+# CHECK: [0x18] <+24>: 48 83 c4 18              addq   $0x18, %rsp
+# CHECK: [0x1c] <+28>: c3                       retq
+# CHECK: [0x1d] <+29>: 06                       <unknown>
+
diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md
index 73ae2ee599640..672db712bd798 100644
--- a/llvm/docs/ReleaseNotes.md
+++ b/llvm/docs/ReleaseNotes.md
@@ -304,6 +304,9 @@ Changes to LLDB
     stop reason = SIGSEGV: sent by tkill system call (sender pid=649752, uid=2667987)
   ```
 * ELF Cores can now have their siginfo structures inspected using `thread siginfo`.
+* Changed invalid disassembly to say <unknown> instead of being blank.
+* Changed the format of opcode bytes to match llvm-objdump when disassembling
+  RISC-V with the -b option.
 
 ### Changes to lldb-dap