[LLVMdev] X86 disassembler is quite broken on handling REX

Craig Topper craig.topper at gmail.com
Tue Dec 23 22:59:20 PST 2014


Wouldn't changing

    case TYPE_MM:                                         \
      if (index > 7)                                      \
        *valid = 0;                                       \
      return prefix##_MM0 + index;


to

    case TYPE_MM:                                         \
      return prefix##_MM0 + (index & 0x7);


Fix the issue for both rex.b and rex.r?

On Tue, Dec 23, 2014 at 10:54 PM, Jun Koi <junkoi2004 at gmail.com> wrote:
>
>
>
> On Wed, Dec 24, 2014 at 2:43 PM, Craig Topper <craig.topper at gmail.com>
> wrote:
>
>> I believe this particular error is caused by this. That seems easy enough
>> to just drop the bit. Do you have other non-mmx examples?
>>
>>     case TYPE_MM:                                         \
>>       if (index > 7)                                      \
>>         *valid = 0;                                       \
>>       return prefix##_MM0 + index;
>>
>
> yes, exactly this place. but the question is: how do we know when to drop
> the REX.B?
>
>
> i dont know any non-MMX examples. it seems only MMX related instructions
> have this issue.
>
> thanks,
> Jun
>
>
>
>
>>
>> On Tue, Dec 23, 2014 at 10:17 PM, Jun Koi <junkoi2004 at gmail.com> wrote:
>>>
>>> hi,
>>>
>>> i think the current X86 disassembler is quite broken and fails badly on
>>> handling REX for x86_64 code.
>>>
>>> below are some examples:
>>>
>>> $ echo "0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble
>>> -triple=x86_64
>>>     .text
>>>     por    %mm3, %mm0
>>>
>>> $ echo "0x40,0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble
>>> -triple=x86_64
>>>     .text
>>>     por    %mm3, %mm0
>>>
>>> $ echo "0x41,0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble
>>> -triple=x86_64
>>>     .text
>>> <stdin>:1:1: warning: invalid instruction encoding
>>> 0x41,0x0f,0xeb,0xc3
>>> ^
>>>
>>>
>>> the last example should also return "por %mm3, %mm0", but it fails to
>>> understand the input.
>>>
>>> the reason stays with this line in X86DisassemblerDecoder.cpp:
>>>
>>>     rm  |= bFromREX(insn->rexPrefix) << 3;
>>>
>>> we can see that we take into account REX.B, but for "por" (0F EB), this
>>> should be ignored.
>>>
>>> there are quite a lot of other instructions taking into account REX like
>>> this, while according to the manual, REX should be ignored.
>>>
>>> i dont see any clean solution for this issue without some significant
>>> changes into the way we decode ModRM & providing more information to .td
>>> files.
>>>
>>> any idea?
>>>
>>> thanks.
>>> Jun
>>>
>>
>>
>> --
>> ~Craig
>>
>
>

-- 
~Craig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141223/d29210cb/attachment.html>


More information about the llvm-dev mailing list