[LLVMdev] X86 Disassembler
Bill Wendling
isanbard at gmail.com
Tue Aug 18 00:57:27 PDT 2009
Hi Sean,
> the attached diff implements a table-driven disassembler for the X86
> architecture (16-, 32-, and 64-bit incarnations), integrated into
> the MC framework. The disassembler is table-driven, using a custom
> TableGen backend to generate hierarchical tables optimized for fast
> decode. The disassembler consumes MemoryObjects and produces arrays
> of MCInsts, adhering to the abstract base class MCDisassembler (llvm/
> MC/MCDisassembler.h).
>
> The disassembler is documented in detail in
> –
> - lib/Target/X86/X86Disassembler.h (disassembler runtime)
> - utils/TableGen/X86DisassemblerEmitter.h (table emitter)
> –
> as well as in the individual files, functions, and classes.
>
> I implemented a use case in tools/llvm-mc/HexDisassembler.cpp, which
> implements an interactive disassembler for sequences of bytes
> entered in hex on standard input. Example output:
> –
> localhost:llvm spyffe$ ./Debug/bin/llvm-mc -disassemble
> 74 22 <-- user input
> …
> 1 instructions:
> 74 22 je 34
> –
> The disassembler is currently fully implemented, and testing is
> ongoing. I still need to handle LEA correctly, and of course the
> instruction tables (lib/Target/X86/X86InstrInfo.td and friends)
> always need expanding. However, because this is a non-invasive
> patch implementing largely independent functionality but its size
> makes maintaining it outside LLVM is a heavy burden, I am submitting
> it for consideration now.
>
> The patch is tested against SVN revision 79306. Please let me know
> if there are any problems, and tell me what I can fix to make this
> worthy of inclusion. Thanks for your time.
>
Great work! It looks really cool.
Here are some initial comments. Most of them are generic/stylistic
comments, but some are more important. Also, I'm not trying to be
picky. :-)
0. Watch out for tabs!
1. Includes like this "#include <llvm/MC/MCInst.h>" should be in
quotes, not angle brackets.
2. In the "class Indenter", the "push" and "pop" methods check for
over/under runs before proceeding. Should these be "asserts" instead?
3. In include/llvm/Support/MemoryObject.h, you changed all "uintptr_t"
to "uint64_t". Is there a good reason for this? If not, please change
them back.
4. If you have a class that's completely contained within a .cpp file,
go ahead and define it in an anonymous namespace and mark it
"VISIBILITY_HIDDEN". Like this:
namespace {
class VISIBILITY_HIDDEN StringMemoryObject : public MemoryObject {
...
5. In "readBytes" in StringMemoryObject, you're returning a "-1", but
the method's return type is "uint64_t". Do you *really* want to return
0xfffffffffffU, or is "-1" a sentinel value of some sort? If the
latter, perhaps changing the return type to "int64_t" instead. If the
former, then changing the return to "-1U" to make it explicit.
6. The use of "#include <iostream>" is strictly forbidden. Please
remove it wherever you find it (HexDisassembler.h for one).
7. Look for opportunities to use the LLVM small container classes. For
instance, the "fOperands" ivar in RecognizableInsn.
8. While we're on that, please use "Instruction" or "Instr" instead of
"Insn" for consistency's sake.
9. You have types called "operandType_t", etc. We normally don't use
the "_t" suffix for types. Perhaps renaming them.
10. In the "static inline bool outranks(instructionContext_t upper,
instructionContext_t lower)", it would be good to assert that upper
and lower are strictly less than IC_max.
11. In several places, you put 'default: assert(0 && "Badness happens
here!");" at the end of a switch statement. When assertions are turned
off, I believe that this could emit warnings. It would be better to
put it as the first case of the "switch". And maybe use the new
"llvm_unreachable()" macro as well.
12. You use "std::endl" a lot. This is discouraged by the coding
standard:
http://llvm.org/docs/CodingStandards.html#ll_avoidendl
13. You use "std::ostream" a lot. Would it be appropriate to use
"raw_ostream" instead?
14. In DisassemblerTables::setTableFields, you emit things to
"std::cerr". Use "errs()" instead. Or, better yet,
"llvm_report_error". :-)
15. Prefer pre-increment/decrement for variables:
http://llvm.org/docs/CodingStandards.html#micro_preincrement
16. In X86RecognizableInsn.cpp, you use the large macro:
HANDLE_OPERAND. Could you make it an inlined function instead? It
could take a pointer to a member function for the method it wants to
call. Debugging large macros is impossible. :)
17. I'm confused by the C files in lib/Target/X86. Why aren't they C++
files?
Okay. That's all I found for now. :-) I haven't looked at the
algorithm too much, though it seems fairly straight-forward. I'm sure
that since you're continually testing it, that it's of high quality.
When you do get around to submission, we will want all of the tests
you've been running to be added to the test/ directory.
Thanks!
-bw
More information about the llvm-dev
mailing list