[LLVMdev] X86 Disassembler
Chris Lattner
clattner at apple.com
Sat Aug 22 16:31:28 PDT 2009
On Aug 19, 2009, at 4:39 PM, Sean Callanan wrote:
> thanks for your comments. I'll respond to them individually. I've
> attached a new revision of the patch that addresses them. Patch
> built and tested against SVN 79487, with the additional attached fix
> that fixes an Intel table bug.
Thanks Sean, comments below. Are you sure you attached the updated
patch? I still see a lot of std::endl's etc.
>> 3. In include/llvm/Support/MemoryObject.h, you changed all
>> "uintptr_t" to "uint64_t". Is there a good reason for this? If not,
>> please change them back.'
>
> Yes, there is. If you're disassembling in a memory space that's
> larger than yours (for example, on a computer where LLVM is built 32-
> bit but the source of the bytes is a 64-bit address space), then I
> don't want to be limited by the local process's notion of uintptr_t.
Makes perfect sense, please add a comment in the MemoryObject doxygen
comment that explains this. You can split this change out of your
megapatch and commit it separately.
>> 13. You use "std::ostream" a lot. Would it be appropriate to use
>> "raw_ostream" instead?
>
> To make sure that the table definition file looks pretty, I use
> <iomanip>, and none of those functions work with raw_ostreams. I
> actually only create three std::ostringstreams – during table
> generation – so for now I'm going to leave that the way it is.
I actually find iomanip to be pretty ugly for simple things. llvm/
Support/Format.h lets you do things like:
OS << "mynumber: " << format("%4.5f", 1234.412) << '\n';
with raw_ostreams.
Using "printf style" format strings is a lot more pretty than:
+#define BYTE_FLAGS std::hex << std::setw(2) << std::setfill('0') <<
std::right
:)
Not a critical issue, but please put a space after control flow
keywords like if/while.
while(current - address < size && current < limit) {
if(readByte(current, &buf[(current - address)]))
Please stick to plain ascii letters in source files:
+ * The raison d'être for this header file is to provide those
definitions that
One meta comment: it is generally preferable to split out patches into
small pieces. Please propose the APIs without the X86 implementation
for the next round, it should be much easier to get that piece in than
doing the whole patch at once.
Some micro level coding stuff:
+#include <llvm/MC/MCInst.h>
+#include <llvm/Support/MemoryObject.h>
+#include <llvm/Support/raw_ostream.h>
Please use "" instead of <>. You can also just forward declare all
these classes instead of #including their headers.
+#include "assert.h"
Please use the c++ version of c headers when possible: <cassert>.
+ MCDisassembler(MemoryObject& region,
+ raw_ostream& vStream) {
Please use "foo_t &f" instead of "foo_t& f".
+ virtual ~MCDisassembler() {
+ }
Make sure each class with a virtual method has at least one method
defined out of line:
http://llvm.org/docs/CodingStandards.html#ll_virtual_anch
Some bigger stuff:
include/llvm/MC/MCDisassembler.h seems to have four copies of itself
in the same file.
In terms of API design:
+ /// Constructor - Performs initial setup for the disassembler.
This may
+ /// do the work of disassembly, or may
(especially on
+ /// fixed-instruction-length CPUs) set things up
for lazy
+ /// disassembly.
+ ///
+ /// @param region - The memory object to use as a source for
machine code.
+ /// @param vStream - The stream to print warnings and diagnostic
messages on.
+ MCDisassembler(MemoryObject& region,
+ raw_ostream& vStream) {
+ }
Forcing diagnostics to get rendered to a stream is somewhat
unfortunate. Different clients may want different information. Would
it be possible to use error codes, or make each action (like
getInstruction) fill in an error code struct with possible failure info?
The MCDisassembler API is also somewhat strange to me. Right now it
is designed to analyze a memory region when constructed and contain
them. It seems that there is a more useful low level API which would
look something like this:
class MCDisassembler {
public:
virtual ~MCDisassembler();
/// DisassembleInstruction - Disassemble the first instruction in
the specified region, printing the disassembled instruction to the
specified raw_ostream, and returning the size of the instruction in
bytes. On error, this returns zero and fills in ErrorInfo with a
human readable description of the error.
virtual unsigned DisassembleInstruction(MemoryObject ®ion,
raw_ostream &OS, std::string &ErrorInfo) = 0;
}
Having this sort of stateless API means that higher level clients
(which are stateful) can be built on top of them without adding
overhead to clients who don't care about the state.
+++ include/llvm/Support/Indenter.h (revision 0)
@@ -0,0 +1,72 @@
+//===- Indenter.h - Output indenter -----------------------------*- C+
+ -*-===//
I'm not a big fan of manipulators like this (as David Greene can tell
you ;-). I just added a new raw_ostream::indent method, so now
instead of stuff like this:
+void DisassemblerTables::emitContextDecision(
+ std::ostream& o2,
+ Indenter& i2,
...
+ o2 << i2.indent() << "struct ContextDecision " << name << " = {" <<
std::endl;
+ i2.push();
+ o2 << i2.indent() << "{ /* opcodeDecisions */" << std::endl;
+ i2.push();
You can use:
+void DisassemblerTables::emitContextDecision(
+ raw_ostream &o2,
+ unsigned &Indent2,
+ o2.indent(Indent2++) << "struct ContextDecision " << name << " =
{\n";
+ o2.indent(Indent2++) << "{ /* opcodeDecisions */\n";
Which reads better, is more efficient, and scales to more than 256
levels of indentation.
+#include <fcntl.h> // for fcntl, open
+#include <stdio.h> // for perror
+#include <sys/types.h> // for read
+#include <sys/uio.h> // for read
+#include <unistd.h> // for read
unistd etc won't work on windows. Please use the llvm/Support/
MemoryBuffer.h API for reading files. The LineReader is easier to
implement with MemoryBuffer because you just scan for \n and \r in the
memory range.
+std::string stringFromMCInst_x86(const MCInst& insn,
+ const Target* target,
+ std::string& triple) {
Please mark this static so that you can shrink the anon namespace.
+ X86ATTAsmPrinter* asmPrinter =
+ dynamic_cast<X86ATTAsmPrinter*>(functionPass);
Please don't use dynamic_cast: we're trying to eliminate RTTI a C cast
should be fine.
+int HexDisassembler::disassemble(const Target* target,
+ LineReader& reader) {
...
+
+ if(targetName == "x86") {
+ disasm = new X86Disassembler::X86_32Disassembler(obj, logstream);
+ triple = "x86-apple-darwin";
+ }
+ else if(targetName == "x86-64") {
+ disasm = new X86Disassembler::X86_64Disassembler(obj, logstream);
+ triple = "x86_64-apple-darwin";
+ }
+ else {
Instead of doing something like this, the X86 disassembler should
register itself as part of the "Target" API the same way that
asmprinters, MCAsmInfo, targets, etc do. The X86 specific
disassembler should go in lib/Target/X86/Disassembler. Take a look at
how the Target/X86/AsmParser stuff registers itself for a good example.
--- tools/llvm-mc/HexDisassembler.h (revision 0)
+++ tools/llvm-mc/HexDisassembler.h (revision 0)
+#include <fstream>
+#include <string>
the fstream #include is not needed. It might make sense to change the
disassemble APIs to take a MemoryBuffer, and have the llvm-mc.cpp
driver just handle all the file reading stuff.
I haven't reviewed the X86 specific changes yet, it would be good to
get the high level design sorted out first, I'll review it in the next
patch. Please commit the MemoryObject.h changes (int64_t + comments)
whenever you feel like it, and propose the basic API changes without
the X86 specific part next.
Thanks for working on this Sean, it's great work!
-Chris
More information about the llvm-dev
mailing list