[LLVMdev] X86 Disassembler

Tue Aug 25 01:41:06 PDT 2009

On Aug 19, 2009, at 4:39 PM, Sean Callanan wrote:

> Bill,
>
> thanks for your comments.  I'll respond to them individually.  I've  
> attached a new revision of the patch that addresses them.  Patch  
> built and tested against SVN 79487, with the additional attached fix  
> that fixes an Intel table bug.
>
Hi Sean,

Thanks for the fixes!

>> 1. Includes like this "#include <llvm/MC/MCInst.h>" should be in  
>> quotes, not angle brackets.
>
> Fixed.
>
I still saw a few more sneaking in there. Also, don't put system  
headers (like assert.h, cstdio, etc.) in quotes. I just meant the LLVM  
headers. :-)

>> 2. In the "class Indenter", the "push" and "pop" methods check for  
>> over/under runs before proceeding. Should these be "asserts" instead?
>
> Yeah, I made them asserts as you suggested.  I was thinking about  
> maybe making them saturate, but honestly if we run out of space we  
> should simply break and allow the developer to make the Indenter  
> bigger.  Maybe we should have another argument to the constructor  
> specifying how large the buffer should be...?
>
Either that or make it a SmallVector with initial size "256". I  
*think* that all of the elements in that vector are guaranteed to be  
contiguous. (Might want to check.) But that way you won't have to  
worry about the array growing too big and needing more space.

>> 3. In include/llvm/Support/MemoryObject.h, you changed all  
>> "uintptr_t" to "uint64_t". Is there a good reason for this? If not,  
>> please change them back.'
>
> Yes, there is.  If you're disassembling in a memory space that's  
> larger than yours (for example, on a computer where LLVM is built 32- 
> bit but the source of the bytes is a 64-bit address space), then I  
> don't want to be limited by the local process's notion of uintptr_t.
>
I see. As Chris said, that makes sense. :) I was just worried that  
bits would fall on the floor if you convert a pointer value to a fixed  
bit-width size. Of course, when we go to 128-bit processors, we'll  
have to change this. :-D

>> 4. If you have a class that's completely contained within a .cpp  
>> file, go ahead and define it in an anonymous namespace and mark it  
>> "VISIBILITY_HIDDEN". Like this:
>>
>> namespace {
>> class VISIBILITY_HIDDEN StringMemoryObject : public MemoryObject {
>> ...
>
> I used VISIBILITY_HIDDEN for both StringMemoryObject and  
> X86DisassemblerEmitter::X86DEBackend.
> I wrapped StringMemoryObject in an anonymous namespace.
> I didn't wrap X86DEBackend because it's already nested.
>
Thanks.

>> 5. In "readBytes" in StringMemoryObject, you're returning a "-1",  
>> but the method's return type is "uint64_t". Do you *really* want to  
>> return 0xfffffffffffU, or is "-1" a sentinel value of some sort? If  
>> the latter, perhaps changing the return type to "int64_t" instead.  
>> If the former, then changing the return to "-1U" to make it explicit.
>
> -1 is a sentinel value.  I changed readBytes to return an int and  
> accept a uint64_t* that it fills in if it's non-NULL.
>
Thanks. I'm not a huge fan of sentinel values if they can be avoided.

>> 7. Look for opportunities to use the LLVM small container classes.  
>> For instance, the "fOperands" ivar in RecognizableInsn.
>
> I looked at my uses of std::vector but they wouldn't really benefit  
> from the good performance of SmallVector for small values of N.
>
> - fInstructions in X86Disassembler.h is not a good candidate for  
> this, because the number of instructions could potentially be quite  
> large.
> - numberedInstructions in X86DisassemblerEmitter.cpp is not a good  
> candidate because there are upwards of a thousand instructions in  
> the x86 table.
> - fInstructionSpecifiers in X86DisassemblerTables.h is not a good  
> candidate either... the number is (again) very large.
> - fOperands is not a good candidate because is never actually  
> created.  It is always set to point to  
> CodeGenInstruction::OperandList.
>
Thanks for checking!

>> 11. In several places, you put 'default: assert(0 && "Badness  
>> happens here!");" at the end of a switch statement. When assertions  
>> are turned off, I believe that this could emit warnings. It would  
>> be better to put it as the first case of the "switch". And maybe  
>> use the new "llvm_unreachable()" macro as well.
>
> All right, I moved them and used llvm_unreachable().  In the C code,  
> where I can't safely #include "llvm/Support/ErrorHandling.h," I  
> declared a macro that does the same thing.
>
Great!

>> 15. Prefer pre-increment/decrement for variables:
>>
>> 	http://llvm.org/docs/CodingStandards.html#micro_preincrement
>
> Learned a new one there.  Thanks!
>
No prob. :-)

>> 16. In X86RecognizableInsn.cpp, you use the large macro:  
>> HANDLE_OPERAND. Could you make it an inlined function instead? It  
>> could take a pointer to a member function for the method it wants  
>> to call. Debugging large macros is impossible. :)
>
> I factored the code out into a function.  There are still macros to  
> keep things pretty, but they just wrap the function.
>
Thanks!

>> 17. I'm confused by the C files in lib/Target/X86. Why aren't they C 
>> ++ files?
>
> The reason is that I am aware of clients that need disassembly but  
> can't link in C++ code, so for those clients I've separated out a C  
> core.  They won't get the benefits of the MCInst, but that's their  
> problem :)
>
Ah! Okay...I wonder if it would be good to place them in the llvm-c  
bindings somehow? I honestly don't know much about those bindings, so  
this might not be feasible. But it would be good to keep icky C code  
out of pristine C++ code (tongue *firmly* planted in cheek).

-bw