[LLVMdev] X86 Disassembler

Bill Wendling isanbard at gmail.com
Tue Aug 18 00:57:27 PDT 2009


Hi Sean,

> the attached diff implements a table-driven disassembler for the X86  
> architecture (16-, 32-, and 64-bit incarnations), integrated into  
> the MC framework.  The disassembler is table-driven, using a custom  
> TableGen backend to generate hierarchical tables optimized for fast  
> decode.  The disassembler consumes MemoryObjects and produces arrays  
> of MCInsts, adhering to the abstract base class MCDisassembler (llvm/ 
> MC/MCDisassembler.h).
>
> The disassembler is documented in detail in
>> - lib/Target/X86/X86Disassembler.h (disassembler runtime)
> - utils/TableGen/X86DisassemblerEmitter.h (table emitter)
>> as well as in the individual files, functions, and classes.
>
> I implemented a use case in tools/llvm-mc/HexDisassembler.cpp, which  
> implements an interactive disassembler for sequences of bytes  
> entered in hex on standard input.  Example output:
>> localhost:llvm spyffe$ ./Debug/bin/llvm-mc -disassemble
> 74 22 <-- user input
>> 1 instructions:
> 74 22 	je	34
>> The disassembler is currently fully implemented, and testing is  
> ongoing.  I still need to handle LEA correctly, and of course the  
> instruction tables (lib/Target/X86/X86InstrInfo.td and friends)  
> always need expanding.  However, because this is a non-invasive  
> patch implementing largely independent functionality but its size  
> makes maintaining it outside LLVM is a heavy burden, I am submitting  
> it for consideration now.
>
> The patch is tested against SVN revision 79306.  Please let me know  
> if there are any problems, and tell me what I can fix to make this  
> worthy of inclusion.  Thanks for your time.
>
Great work! It looks really cool.

Here are some initial comments. Most of them are generic/stylistic  
comments, but some are more important. Also, I'm not trying to be  
picky. :-)

0. Watch out for tabs!
1. Includes like this "#include <llvm/MC/MCInst.h>" should be in  
quotes, not angle brackets.
2. In the "class Indenter", the "push" and "pop" methods check for  
over/under runs before proceeding. Should these be "asserts" instead?
3. In include/llvm/Support/MemoryObject.h, you changed all "uintptr_t"  
to "uint64_t". Is there a good reason for this? If not, please change  
them back.
4. If you have a class that's completely contained within a .cpp file,  
go ahead and define it in an anonymous namespace and mark it  
"VISIBILITY_HIDDEN". Like this:

namespace {
class VISIBILITY_HIDDEN StringMemoryObject : public MemoryObject {
...

5. In "readBytes" in StringMemoryObject, you're returning a "-1", but  
the method's return type is "uint64_t". Do you *really* want to return  
0xfffffffffffU, or is "-1" a sentinel value of some sort? If the  
latter, perhaps changing the return type to "int64_t" instead. If the  
former, then changing the return to "-1U" to make it explicit.

6. The use of "#include <iostream>" is strictly forbidden. Please  
remove it wherever you find it (HexDisassembler.h for one).

7. Look for opportunities to use the LLVM small container classes. For  
instance, the "fOperands" ivar in RecognizableInsn.

8. While we're on that, please use "Instruction" or "Instr" instead of  
"Insn" for consistency's sake.

9. You have types called "operandType_t", etc. We normally don't use  
the "_t" suffix for types. Perhaps renaming them.

10. In the "static inline bool outranks(instructionContext_t upper,  
instructionContext_t lower)", it would be good to assert that upper  
and lower are strictly less than IC_max.

11. In several places, you put 'default: assert(0 && "Badness happens  
here!");" at the end of a switch statement. When assertions are turned  
off, I believe that this could emit warnings. It would be better to  
put it as the first case of the "switch". And maybe use the new  
"llvm_unreachable()" macro as well.

12. You use "std::endl" a lot. This is discouraged by the coding  
standard:

	http://llvm.org/docs/CodingStandards.html#ll_avoidendl

13. You use "std::ostream" a lot. Would it be appropriate to use  
"raw_ostream" instead?

14. In DisassemblerTables::setTableFields, you emit things to  
"std::cerr". Use "errs()" instead. Or, better yet,  
"llvm_report_error". :-)

15. Prefer pre-increment/decrement for variables:

	http://llvm.org/docs/CodingStandards.html#micro_preincrement

16. In X86RecognizableInsn.cpp, you use the large macro:  
HANDLE_OPERAND. Could you make it an inlined function instead? It  
could take a pointer to a member function for the method it wants to  
call. Debugging large macros is impossible. :)

17. I'm confused by the C files in lib/Target/X86. Why aren't they C++  
files?

Okay. That's all I found for now. :-) I haven't looked at the  
algorithm too much, though it seems fairly straight-forward. I'm sure  
that since you're continually testing it, that it's of high quality.  
When you do get around to submission, we will want all of the tests  
you've been running to be added to the test/ directory.

Thanks!
-bw





More information about the llvm-dev mailing list