[llvm] eb6da94 - [lldb] Improve disassembly of unknown instructions (#145793)

via llvm-commits llvm-commits at lists.llvm.org
Mon Jul 14 19:50:25 PDT 2025


Author: tedwoodward
Date: 2025-07-14T21:50:22-05:00
New Revision: eb6da944af31dd684be3ab2f93f453a3837a72c6

URL: https://github.com/llvm/llvm-project/commit/eb6da944af31dd684be3ab2f93f453a3837a72c6
DIFF: https://github.com/llvm/llvm-project/commit/eb6da944af31dd684be3ab2f93f453a3837a72c6.diff

LOG: [lldb] Improve disassembly of unknown instructions (#145793)

LLDB uses the LLVM disassembler to determine the size of instructions and
to do the actual disassembly. Currently, if the LLVM disassembler can't
disassemble an instruction, LLDB will ignore the instruction size, assume
the instruction size is the minimum size for that device, print no useful
opcode, and print nothing for the instruction.

This patch changes this behavior to separate the instruction size and
"can't disassemble". If the LLVM disassembler knows the size, but can't
dissasemble the instruction, LLDB will use that size. It will print out
the opcode, and will print "<unknown>" for the instruction. This is much
more useful to both a user and a script.

The impetus behind this change is to clean up RISC-V disassembly when
the LLVM disassembler doesn't understand all of the instructions.
RISC-V supports proprietary extensions, where the TD files don't know
about certain instructions, and the disassembler can't disassemble them.
Internal users want to be able to disassemble these instructions.

With llvm-objdump, the solution is to pipe the output of the disassembly
through a filter program. This patch modifies LLDB's disassembly to look
more like llvm-objdump's, and includes an example python script that adds
a command "fdis" that will disassemble, then pipe the output through a
specified filter program. This has been tested with crustfilt, a sample
filter located at https://github.com/quic/crustfilt .

Changes in this PR:
- Decouple "can't disassemble" with "instruction size".
  DisassemblerLLVMC::MCDisasmInstance::GetMCInst now returns a bool for
    valid disassembly, and has the size as an out paramter.
  Use the size even if the disassembly is invalid.
  Disassemble if disassemby is valid.

- Always print out the opcode when -b is specified.
  Previously it wouldn't print out the opcode if it couldn't disassemble.

- Print out RISC-V opcodes the way llvm-objdump does.
  Code for the new Opcode Type eType16_32Tuples by Jason Molenda.

- Print <unknown> for instructions that can't be disassembled, matching
  llvm-objdump, instead of printing nothing.

- Update max riscv32 and riscv64 instruction size to 8.

- Add example "fdis" command script.

- Added disassembly byte test for x86 with known and unknown instructions.
- Added disassembly byte test for riscv32 with known and unknown instructions,
  with and without filtering.
- Added test from Jason Molenda to RISC-V disassembly unit tests.

Added: 
    lldb/examples/python/filter_disasm.py
    lldb/test/Shell/Commands/Inputs/dis_filt.py
    lldb/test/Shell/Commands/command-disassemble-riscv32-bytes.s
    lldb/test/Shell/Commands/command-disassemble-x86-bytes.s

Modified: 
    lldb/include/lldb/Core/Opcode.h
    lldb/source/Core/Disassembler.cpp
    lldb/source/Core/Opcode.cpp
    lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp
    lldb/source/Utility/ArchSpec.cpp
    lldb/unittests/Disassembler/RISCV/TestMCDisasmInstanceRISCV.cpp
    llvm/docs/ReleaseNotes.md

Removed: 
    


################################################################################
diff  --git a/lldb/examples/python/filter_disasm.py b/lldb/examples/python/filter_disasm.py
new file mode 100644
index 0000000000000..de99d4031a7fd
--- /dev/null
+++ b/lldb/examples/python/filter_disasm.py
@@ -0,0 +1,84 @@
+"""
+Defines a command, fdis, that does filtered disassembly. The command does the
+lldb disassemble command with -b and any other arguments passed in, and
+pipes that through a provided filter program.
+
+The intention is to support disassembly of RISC-V proprietary instructions.
+This is handled with llvm-objdump by piping the output of llvm-objdump through
+a filter program. This script is intended to mimic that workflow.
+"""
+
+import lldb
+import subprocess
+
+filter_program = "crustfilt"
+
+
+def __lldb_init_module(debugger, dict):
+    debugger.HandleCommand("command script add -f filter_disasm.fdis fdis")
+    print("Disassembly filter command (fdis) loaded")
+    print("Filter program set to %s" % filter_program)
+
+
+def fdis(debugger, args, exe_ctx, result, dict):
+    """
+  Call the built in disassembler, then pass its output to a filter program
+  to add in disassembly for hidden opcodes.
+  Except for get and set, use the fdis command like the disassemble command.
+  By default, the filter program is crustfilt, from
+  https://github.com/quic/crustfilt . This can be changed by changing
+  the global variable filter_program.
+
+  Usage:
+    fdis [[get] [set <program>] [<disassembly options>]]
+
+    Choose one of the following:
+        get
+            Gets the current filter program
+
+        set <program>
+            Sets the current filter program. This can be an executable, which
+            will be found on PATH, or an absolute path.
+
+        <disassembly options>
+            If the first argument is not get or set, the args will be passed
+            to the disassemble command as is.
+
+    """
+
+    global filter_program
+    args_list = args.split(" ")
+    result.Clear()
+
+    if len(args_list) == 1 and args_list[0] == "get":
+        result.PutCString(filter_program)
+        result.SetStatus(lldb.eReturnStatusSuccessFinishResult)
+        return
+
+    if len(args_list) == 2 and args_list[0] == "set":
+        filter_program = args_list[1]
+        result.PutCString("Filter program set to %s" % filter_program)
+        result.SetStatus(lldb.eReturnStatusSuccessFinishResult)
+        return
+
+    res = lldb.SBCommandReturnObject()
+    debugger.GetCommandInterpreter().HandleCommand("disassemble -b " + args, exe_ctx, res)
+    if len(res.GetError()) > 0:
+        result.SetError(res.GetError())
+        result.SetStatus(lldb.eReturnStatusFailed)
+        return
+    output = res.GetOutput()
+
+    try:
+        proc = subprocess.run([filter_program], capture_output=True, text=True, input=output)
+    except (subprocess.SubprocessError, OSError) as e:
+        result.PutCString("Error occurred. Original disassembly:\n\n" + output)
+        result.SetError(str(e))
+        result.SetStatus(lldb.eReturnStatusFailed)
+        return
+
+    if proc.returncode:
+        result.PutCString("warning: {} returned non-zero value {}".format(filter_program, proc.returncode))
+
+    result.PutCString(proc.stdout)
+    result.SetStatus(lldb.eReturnStatusSuccessFinishResult)

diff  --git a/lldb/include/lldb/Core/Opcode.h b/lldb/include/lldb/Core/Opcode.h
index f72f2687b54fe..91af15c62e6ab 100644
--- a/lldb/include/lldb/Core/Opcode.h
+++ b/lldb/include/lldb/Core/Opcode.h
@@ -32,7 +32,10 @@ class Opcode {
     eTypeInvalid,
     eType8,
     eType16,
-    eType16_2, // a 32-bit Thumb instruction, made up of two words
+    eType16_2,        // a 32-bit Thumb instruction, made up of two words
+    eType16_32Tuples, // RISC-V that can have 2, 4, 6, 8 etc byte long
+                      // instructions which will be printed in combinations of
+                      // 16 & 32-bit words.
     eType32,
     eType64,
     eTypeBytes
@@ -60,9 +63,9 @@ class Opcode {
     m_data.inst64 = inst;
   }
 
-  Opcode(uint8_t *bytes, size_t length)
-      : m_byte_order(lldb::eByteOrderInvalid) {
-    SetOpcodeBytes(bytes, length);
+  Opcode(uint8_t *bytes, size_t length, Opcode::Type type,
+         lldb::ByteOrder order) {
+    DoSetOpcodeBytes(bytes, length, type, order);
   }
 
   void Clear() {
@@ -82,6 +85,8 @@ class Opcode {
       break;
     case Opcode::eType16_2:
       break;
+    case Opcode::eType16_32Tuples:
+      break;
     case Opcode::eType32:
       break;
     case Opcode::eType64:
@@ -103,6 +108,8 @@ class Opcode {
                              : m_data.inst16;
     case Opcode::eType16_2:
       break;
+    case Opcode::eType16_32Tuples:
+      break;
     case Opcode::eType32:
       break;
     case Opcode::eType64:
@@ -122,6 +129,8 @@ class Opcode {
     case Opcode::eType16:
       return GetEndianSwap() ? llvm::byteswap<uint16_t>(m_data.inst16)
                              : m_data.inst16;
+    case Opcode::eType16_32Tuples:
+      break;
     case Opcode::eType16_2: // passthrough
     case Opcode::eType32:
       return GetEndianSwap() ? llvm::byteswap<uint32_t>(m_data.inst32)
@@ -143,6 +152,8 @@ class Opcode {
     case Opcode::eType16:
       return GetEndianSwap() ? llvm::byteswap<uint16_t>(m_data.inst16)
                              : m_data.inst16;
+    case Opcode::eType16_32Tuples:
+      break;
     case Opcode::eType16_2: // passthrough
     case Opcode::eType32:
       return GetEndianSwap() ? llvm::byteswap<uint32_t>(m_data.inst32)
@@ -186,20 +197,30 @@ class Opcode {
     m_byte_order = order;
   }
 
+  void SetOpcode16_32TupleBytes(const void *bytes, size_t length,
+                                lldb::ByteOrder order) {
+    DoSetOpcodeBytes(bytes, length, eType16_32Tuples, order);
+  }
+
   void SetOpcodeBytes(const void *bytes, size_t length) {
+    DoSetOpcodeBytes(bytes, length, eTypeBytes, lldb::eByteOrderInvalid);
+  }
+
+  void DoSetOpcodeBytes(const void *bytes, size_t length, Opcode::Type type,
+                        lldb::ByteOrder order) {
     if (bytes != nullptr && length > 0) {
-      m_type = eTypeBytes;
+      m_type = type;
       m_data.inst.length = length;
       assert(length < sizeof(m_data.inst.bytes));
       memcpy(m_data.inst.bytes, bytes, length);
-      m_byte_order = lldb::eByteOrderInvalid;
+      m_byte_order = order;
     } else {
       m_type = eTypeInvalid;
       m_data.inst.length = 0;
     }
   }
 
-  int Dump(Stream *s, uint32_t min_byte_width);
+  int Dump(Stream *s, uint32_t min_byte_width) const;
 
   const void *GetOpcodeBytes() const {
     return ((m_type == Opcode::eTypeBytes) ? m_data.inst.bytes : nullptr);
@@ -213,6 +234,8 @@ class Opcode {
       return sizeof(m_data.inst8);
     case Opcode::eType16:
       return sizeof(m_data.inst16);
+    case Opcode::eType16_32Tuples:
+      return m_data.inst.length;
     case Opcode::eType16_2: // passthrough
     case Opcode::eType32:
       return sizeof(m_data.inst32);
@@ -238,6 +261,8 @@ class Opcode {
       return &m_data.inst8;
     case Opcode::eType16:
       return &m_data.inst16;
+    case Opcode::eType16_32Tuples:
+      return m_data.inst.bytes;
     case Opcode::eType16_2: // passthrough
     case Opcode::eType32:
       return &m_data.inst32;

diff  --git a/lldb/source/Core/Disassembler.cpp b/lldb/source/Core/Disassembler.cpp
index 833e327579a29..925de2a5c836c 100644
--- a/lldb/source/Core/Disassembler.cpp
+++ b/lldb/source/Core/Disassembler.cpp
@@ -685,10 +685,12 @@ void Instruction::Dump(lldb_private::Stream *s, uint32_t max_opcode_byte_size,
     }
   }
   const size_t opcode_pos = ss.GetSizeOfLastLine();
-  const std::string &opcode_name =
-      show_color ? m_markup_opcode_name : m_opcode_name;
+  std::string &opcode_name = show_color ? m_markup_opcode_name : m_opcode_name;
   const std::string &mnemonics = show_color ? m_markup_mnemonics : m_mnemonics;
 
+  if (opcode_name.empty())
+    opcode_name = "<unknown>";
+
   // The default opcode size of 7 characters is plenty for most architectures
   // but some like arm can pull out the occasional vqrshrun.s16.  We won't get
   // consistent column spacing in these cases, unfortunately. Also note that we

diff  --git a/lldb/source/Core/Opcode.cpp b/lldb/source/Core/Opcode.cpp
index 3e30d98975d8a..6c9ced9c11230 100644
--- a/lldb/source/Core/Opcode.cpp
+++ b/lldb/source/Core/Opcode.cpp
@@ -21,7 +21,7 @@
 using namespace lldb;
 using namespace lldb_private;
 
-int Opcode::Dump(Stream *s, uint32_t min_byte_width) {
+int Opcode::Dump(Stream *s, uint32_t min_byte_width) const {
   const uint32_t previous_bytes = s->GetWrittenBytes();
   switch (m_type) {
   case Opcode::eTypeInvalid:
@@ -38,6 +38,27 @@ int Opcode::Dump(Stream *s, uint32_t min_byte_width) {
     s->Printf("0x%8.8x", m_data.inst32);
     break;
 
+  case Opcode::eType16_32Tuples: {
+    const bool format_as_words = (m_data.inst.length % 4) == 0;
+    uint32_t i = 0;
+    while (i < m_data.inst.length) {
+      if (i > 0)
+        s->PutChar(' ');
+      if (format_as_words) {
+        // Format as words; print 1 or more UInt32 values.
+        s->Printf("%2.2x%2.2x%2.2x%2.2x", m_data.inst.bytes[i + 3],
+                  m_data.inst.bytes[i + 2], m_data.inst.bytes[i + 1],
+                  m_data.inst.bytes[i + 0]);
+        i += 4;
+      } else {
+        // Format as halfwords; print 1 or more UInt16 values.
+        s->Printf("%2.2x%2.2x", m_data.inst.bytes[i + 1],
+                  m_data.inst.bytes[i + 0]);
+        i += 2;
+      }
+    }
+  } break;
+
   case Opcode::eType64:
     s->Printf("0x%16.16" PRIx64, m_data.inst64);
     break;
@@ -69,6 +90,7 @@ lldb::ByteOrder Opcode::GetDataByteOrder() const {
   case Opcode::eType8:
   case Opcode::eType16:
   case Opcode::eType16_2:
+  case Opcode::eType16_32Tuples:
   case Opcode::eType32:
   case Opcode::eType64:
     return endian::InlHostByteOrder();
@@ -113,6 +135,9 @@ uint32_t Opcode::GetData(DataExtractor &data) const {
         swap_buf[3] = m_data.inst.bytes[2];
         buf = swap_buf;
         break;
+      case Opcode::eType16_32Tuples:
+        buf = GetOpcodeDataBytes();
+        break;
       case Opcode::eType32:
         *(uint32_t *)swap_buf = llvm::byteswap<uint32_t>(m_data.inst32);
         buf = swap_buf;

diff  --git a/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp b/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp
index 644084ba8d57a..564b787594f71 100644
--- a/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp
+++ b/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp
@@ -61,6 +61,8 @@ class DisassemblerLLVMC::MCDisasmInstance {
 
   uint64_t GetMCInst(const uint8_t *opcode_data, size_t opcode_data_len,
                      lldb::addr_t pc, llvm::MCInst &mc_inst) const;
+  bool GetMCInst(const uint8_t *opcode_data, size_t opcode_data_len,
+                 lldb::addr_t pc, llvm::MCInst &mc_inst, size_t &size) const;
   void PrintMCInst(llvm::MCInst &mc_inst, lldb::addr_t pc,
                    std::string &inst_string, std::string &comments_string);
   void SetStyle(bool use_hex_immed, HexImmediateStyle hex_style);
@@ -486,8 +488,13 @@ class InstructionLLVMC : public lldb_private::Instruction {
           break;
 
         default:
-          m_opcode.SetOpcodeBytes(data.PeekData(data_offset, min_op_byte_size),
-                                  min_op_byte_size);
+          if (arch.GetTriple().isRISCV())
+            m_opcode.SetOpcode16_32TupleBytes(
+                data.PeekData(data_offset, min_op_byte_size), min_op_byte_size,
+                byte_order);
+          else
+            m_opcode.SetOpcodeBytes(
+                data.PeekData(data_offset, min_op_byte_size), min_op_byte_size);
           got_op = true;
           break;
         }
@@ -524,13 +531,16 @@ class InstructionLLVMC : public lldb_private::Instruction {
           const addr_t pc = m_address.GetFileAddress();
           llvm::MCInst inst;
 
-          const size_t inst_size =
-              mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc, inst);
-          if (inst_size == 0)
-            m_opcode.Clear();
-          else {
-            m_opcode.SetOpcodeBytes(opcode_data, inst_size);
-            m_is_valid = true;
+          size_t inst_size = 0;
+          m_is_valid = mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len,
+                                                pc, inst, inst_size);
+          m_opcode.Clear();
+          if (inst_size != 0) {
+            if (arch.GetTriple().isRISCV())
+              m_opcode.SetOpcode16_32TupleBytes(opcode_data, inst_size,
+                                                byte_order);
+            else
+              m_opcode.SetOpcodeBytes(opcode_data, inst_size);
           }
         }
       }
@@ -604,10 +614,11 @@ class InstructionLLVMC : public lldb_private::Instruction {
         const uint8_t *opcode_data = data.GetDataStart();
         const size_t opcode_data_len = data.GetByteSize();
         llvm::MCInst inst;
-        size_t inst_size =
-            mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc, inst);
+        size_t inst_size = 0;
+        bool valid = mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc,
+                                              inst, inst_size);
 
-        if (inst_size > 0) {
+        if (valid && inst_size > 0) {
           mc_disasm_ptr->SetStyle(use_hex_immediates, hex_style);
 
           const bool saved_use_color = mc_disasm_ptr->GetUseColor();
@@ -1206,9 +1217,10 @@ class InstructionLLVMC : public lldb_private::Instruction {
     const uint8_t *opcode_data = data.GetDataStart();
     const size_t opcode_data_len = data.GetByteSize();
     llvm::MCInst inst;
-    const size_t inst_size =
-        mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc, inst);
-    if (inst_size == 0)
+    size_t inst_size = 0;
+    const bool valid = mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len,
+                                                pc, inst, inst_size);
+    if (!valid)
       return;
 
     m_has_visited_instruction = true;
@@ -1337,19 +1349,19 @@ DisassemblerLLVMC::MCDisasmInstance::MCDisasmInstance(
          m_asm_info_up && m_context_up && m_disasm_up && m_instr_printer_up);
 }
 
-uint64_t DisassemblerLLVMC::MCDisasmInstance::GetMCInst(
-    const uint8_t *opcode_data, size_t opcode_data_len, lldb::addr_t pc,
-    llvm::MCInst &mc_inst) const {
+bool DisassemblerLLVMC::MCDisasmInstance::GetMCInst(const uint8_t *opcode_data,
+                                                    size_t opcode_data_len,
+                                                    lldb::addr_t pc,
+                                                    llvm::MCInst &mc_inst,
+                                                    size_t &size) const {
   llvm::ArrayRef<uint8_t> data(opcode_data, opcode_data_len);
   llvm::MCDisassembler::DecodeStatus status;
 
-  uint64_t new_inst_size;
-  status = m_disasm_up->getInstruction(mc_inst, new_inst_size, data, pc,
-                                       llvm::nulls());
+  status = m_disasm_up->getInstruction(mc_inst, size, data, pc, llvm::nulls());
   if (status == llvm::MCDisassembler::Success)
-    return new_inst_size;
+    return true;
   else
-    return 0;
+    return false;
 }
 
 void DisassemblerLLVMC::MCDisasmInstance::PrintMCInst(

diff  --git a/lldb/source/Utility/ArchSpec.cpp b/lldb/source/Utility/ArchSpec.cpp
index 70b9800f4dade..7c71aaae6bcf2 100644
--- a/lldb/source/Utility/ArchSpec.cpp
+++ b/lldb/source/Utility/ArchSpec.cpp
@@ -228,9 +228,9 @@ static const CoreDefinition g_core_definitions[] = {
     {eByteOrderLittle, 4, 4, 4, llvm::Triple::hexagon,
      ArchSpec::eCore_hexagon_hexagonv5, "hexagonv5"},
 
-    {eByteOrderLittle, 4, 2, 4, llvm::Triple::riscv32, ArchSpec::eCore_riscv32,
+    {eByteOrderLittle, 4, 2, 8, llvm::Triple::riscv32, ArchSpec::eCore_riscv32,
      "riscv32"},
-    {eByteOrderLittle, 8, 2, 4, llvm::Triple::riscv64, ArchSpec::eCore_riscv64,
+    {eByteOrderLittle, 8, 2, 8, llvm::Triple::riscv64, ArchSpec::eCore_riscv64,
      "riscv64"},
 
     {eByteOrderLittle, 4, 4, 4, llvm::Triple::loongarch32,

diff  --git a/lldb/test/Shell/Commands/Inputs/dis_filt.py b/lldb/test/Shell/Commands/Inputs/dis_filt.py
new file mode 100755
index 0000000000000..bac5a36be2f3c
--- /dev/null
+++ b/lldb/test/Shell/Commands/Inputs/dis_filt.py
@@ -0,0 +1,8 @@
+#! /usr/bin/env python3
+
+import sys
+
+for line in sys.stdin:
+    if "0940003f 00200020" in line and "<unknown>" in line:
+        line = line.replace("<unknown>", "Fake64")
+    print(line, end="")

diff  --git a/lldb/test/Shell/Commands/command-disassemble-riscv32-bytes.s b/lldb/test/Shell/Commands/command-disassemble-riscv32-bytes.s
new file mode 100644
index 0000000000000..01b9ba261d660
--- /dev/null
+++ b/lldb/test/Shell/Commands/command-disassemble-riscv32-bytes.s
@@ -0,0 +1,36 @@
+# REQUIRES: riscv
+
+# This test verifies that disassemble -b prints out the correct bytes and
+# format for standard and unknown riscv instructions of various sizes,
+# and that unknown instructions show opcodes and disassemble as "<unknown>".
+# It also tests that the fdis command from examples/python/filter_disasm.py
+# pipes the disassembly output through a simple filter program correctly.
+
+
+# RUN: llvm-mc -filetype=obj -mattr=+c --triple=riscv32-unknown-unknown %s -o %t
+# RUN: %lldb -b %t "-o" "disassemble -b -n main" | FileCheck %s
+# RUN: %lldb -b %t -o "command script import %S/../../../examples/python/filter_disasm.py" -o "fdis set %S/Inputs/dis_filt.py" -o "fdis -n main" | FileCheck --check-prefix=FILTER %s
+
+main:
+    addi   sp, sp, -0x20               # 16 bit standard instruction
+    sw     a0, -0xc(s0)                # 32 bit standard instruction
+    .insn 8, 0x2000200940003F;         # 64 bit custom instruction
+    .insn 6, 0x021F | 0x00001000 << 32 # 48 bit xqci.e.li rd=8 imm=0x1000
+    .insn 4, 0x84F940B                 # 32 bit xqci.insbi  
+    .insn 2, 0xB8F2                    # 16 bit cm.push
+
+# CHECK:      [0x0] <+0>:   1101                     addi   sp, sp, -0x20 
+# CHECK-NEXT: [0x2] <+2>:   fea42a23                 sw     a0, -0xc(s0)
+# CHECK-NEXT: [0x6] <+6>:   0940003f 00200020        <unknown>
+# CHECK-NEXT: [0xe] <+14>:  021f 0000 1000           <unknown>
+# CHECK-NEXT: [0x14] <+20>: 084f940b                 <unknown>
+# CHECK-NEXT: [0x18] <+24>: b8f2                     <unknown>
+
+# FILTER: Disassembly filter command (fdis) loaded
+# FILTER:      [0x0] <+0>:   1101                     addi   sp, sp, -0x20 
+# FILTER-NEXT: [0x2] <+2>:   fea42a23                 sw     a0, -0xc(s0)
+# FILTER-NEXT: [0x6] <+6>:   0940003f 00200020        Fake64
+# FILTER-NEXT: [0xe] <+14>:  021f 0000 1000           <unknown>
+# FILTER-NEXT: [0x14] <+20>: 084f940b                 <unknown>
+# FILTER-NEXT: [0x18] <+24>: b8f2                     <unknown>
+

diff  --git a/lldb/test/Shell/Commands/command-disassemble-x86-bytes.s b/lldb/test/Shell/Commands/command-disassemble-x86-bytes.s
new file mode 100644
index 0000000000000..fae08d09a0832
--- /dev/null
+++ b/lldb/test/Shell/Commands/command-disassemble-x86-bytes.s
@@ -0,0 +1,28 @@
+# REQUIRES: x86
+
+# This test verifies that disassemble -b prints out the correct bytes and
+# format for x86_64 instructions of various sizes, and that an unknown
+# instruction shows the opcode and disassembles as "<unknown>"
+
+# RUN: llvm-mc -filetype=obj --triple=x86_64-unknown-unknown %s -o %t
+# RUN: %lldb -b %t -o "disassemble -b -n main" | FileCheck %s
+
+main:                                   # @main
+	subq   $0x18, %rsp
+	movl   $0x0, 0x14(%rsp)
+	movq   %rdx, 0x8(%rsp)
+	movl   %ecx, 0x4(%rsp)
+	movl   (%rsp), %eax
+        addq   $0x18, %rsp
+	retq
+        .byte  0x6 
+
+# CHECK: [0x0] <+0>:   48 83 ec 18              subq   $0x18, %rsp
+# CHECK-NEXT: [0x4] <+4>:   c7 44 24 14 00 00 00 00  movl   $0x0, 0x14(%rsp)
+# CHECK-NEXT: [0xc] <+12>:  48 89 54 24 08           movq   %rdx, 0x8(%rsp)
+# CHECK-NEXT: [0x11] <+17>: 89 4c 24 04              movl   %ecx, 0x4(%rsp)
+# CHECK-NEXT: [0x15] <+21>: 8b 04 24                 movl   (%rsp), %eax
+# CHECK-NEXT: [0x18] <+24>: 48 83 c4 18              addq   $0x18, %rsp
+# CHECK-NEXT: [0x1c] <+28>: c3                       retq
+# CHECK-NEXT: [0x1d] <+29>: 06                       <unknown>
+

diff  --git a/lldb/unittests/Disassembler/RISCV/TestMCDisasmInstanceRISCV.cpp b/lldb/unittests/Disassembler/RISCV/TestMCDisasmInstanceRISCV.cpp
index 8ec5d62a99ac5..64177a2fac490 100644
--- a/lldb/unittests/Disassembler/RISCV/TestMCDisasmInstanceRISCV.cpp
+++ b/lldb/unittests/Disassembler/RISCV/TestMCDisasmInstanceRISCV.cpp
@@ -14,6 +14,7 @@
 #include "lldb/Core/Disassembler.h"
 #include "lldb/Target/ExecutionContext.h"
 #include "lldb/Utility/ArchSpec.h"
+#include "lldb/Utility/StreamString.h"
 
 #include "Plugins/Disassembler/LLVMC/DisassemblerLLVMC.h"
 
@@ -60,12 +61,6 @@ TEST_F(TestMCDisasmInstanceRISCV, TestRISCV32Instruction) {
       arch, nullptr, nullptr, nullptr, nullptr, start_addr, &data, sizeof(data),
       num_of_instructions, false);
 
-  // If we failed to get a disassembler, we can assume it is because
-  // the llvm we linked against was not built with the riscv target,
-  // and we should skip these tests without marking anything as failing.
-  if (!disass_sp)
-    return;
-
   const InstructionList inst_list(disass_sp->GetInstructionList());
   EXPECT_EQ(num_of_instructions, inst_list.GetSize());
 
@@ -90,3 +85,58 @@ TEST_F(TestMCDisasmInstanceRISCV, TestRISCV32Instruction) {
   EXPECT_FALSE(inst_sp->IsCall());
   EXPECT_TRUE(inst_sp->DoesBranch());
 }
+
+TEST_F(TestMCDisasmInstanceRISCV, TestOpcodeBytePrinter) {
+  ArchSpec arch("riscv32-*-linux");
+
+  const unsigned num_of_instructions = 7;
+  // clang-format off
+  uint8_t data[] = {
+      0x41, 0x11,             // addi   sp, sp, -0x10
+      0x06, 0xc6,             // sw     ra, 0xc(sp)
+      0x23, 0x2a, 0xa4, 0xfe, // sw     a0, -0xc(s0)
+      0x23, 0x28, 0xa4, 0xfe, // sw     a0, -0x10(s0)
+      0x22, 0x44,             // lw     s0, 0x8(sp)
+
+      0x3f, 0x00, 0x40, 0x09, // Fake 64-bit instruction
+      0x20, 0x00, 0x20, 0x00,
+
+      0x1f, 0x02,             // 48 bit xqci.e.li rd=8 imm=0x1000
+      0x00, 0x00, 
+      0x00, 0x10,
+  };
+  // clang-format on
+
+  // clang-format off
+  const char *expected_outputs[] = {
+    "1141",
+    "c606",
+    "fea42a23",
+    "fea42823",
+    "4422",
+    "0940003f 00200020",
+    "021f 0000 1000"
+  };
+  // clang-format on
+  const unsigned num_of_expected_outputs =
+      sizeof(expected_outputs) / sizeof(char *);
+
+  EXPECT_EQ(num_of_instructions, num_of_expected_outputs);
+
+  DisassemblerSP disass_sp;
+  Address start_addr(0x100);
+  disass_sp = Disassembler::DisassembleBytes(
+      arch, nullptr, nullptr, nullptr, nullptr, start_addr, &data, sizeof(data),
+      num_of_instructions, false);
+
+  const InstructionList inst_list(disass_sp->GetInstructionList());
+  EXPECT_EQ(num_of_instructions, inst_list.GetSize());
+
+  for (size_t i = 0; i < num_of_instructions; i++) {
+    InstructionSP inst_sp;
+    StreamString s;
+    inst_sp = inst_list.GetInstructionAtIndex(i);
+    inst_sp->GetOpcode().Dump(&s, 1);
+    ASSERT_STREQ(s.GetString().str().c_str(), expected_outputs[i]);
+  }
+}

diff  --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md
index 43ed85db0315c..63a2f5ef9423b 100644
--- a/llvm/docs/ReleaseNotes.md
+++ b/llvm/docs/ReleaseNotes.md
@@ -307,6 +307,10 @@ Changes to LLDB
     stop reason = SIGSEGV: sent by tkill system call (sender pid=649752, uid=2667987)
   ```
 * ELF Cores can now have their siginfo structures inspected using `thread siginfo`.
+* Disassembly of unknown instructions now produces "<unknown>" instead of
+  nothing at all
+* Changed the format of opcode bytes to match llvm-objdump when disassembling
+  RISC-V code with `disassemble`'s `--byte` option.
 
 ### Changes to lldb-dap
 


        


More information about the llvm-commits mailing list