[LLVMdev] X86 disassembler is quite broken on handling REX

Jun Koi junkoi2004 at gmail.com
Tue Dec 23 23:01:53 PST 2014


On Wed, Dec 24, 2014 at 2:59 PM, Craig Topper <craig.topper at gmail.com>
wrote:

> Wouldn't changing
>
>     case TYPE_MM:                                         \
>       if (index > 7)                                      \
>         *valid = 0;                                       \
>       return prefix##_MM0 + index;
>
>
> to
>
>     case TYPE_MM:                                         \
>       return prefix##_MM0 + (index & 0x7);
>
>
> Fix the issue for both rex.b and rex.r?
>

this sounds OK. but there is no more check (index > 7)? is there any case
that ca be the issue?

thanks,
Jun




>
> On Tue, Dec 23, 2014 at 10:54 PM, Jun Koi <junkoi2004 at gmail.com> wrote:
>>
>>
>>
>> On Wed, Dec 24, 2014 at 2:43 PM, Craig Topper <craig.topper at gmail.com>
>> wrote:
>>
>>> I believe this particular error is caused by this. That seems easy
>>> enough to just drop the bit. Do you have other non-mmx examples?
>>>
>>>     case TYPE_MM:                                         \
>>>       if (index > 7)                                      \
>>>         *valid = 0;                                       \
>>>       return prefix##_MM0 + index;
>>>
>>
>> yes, exactly this place. but the question is: how do we know when to drop
>> the REX.B?
>>
>>
>> i dont know any non-MMX examples. it seems only MMX related instructions
>> have this issue.
>>
>> thanks,
>> Jun
>>
>>
>>
>>
>>>
>>> On Tue, Dec 23, 2014 at 10:17 PM, Jun Koi <junkoi2004 at gmail.com> wrote:
>>>>
>>>> hi,
>>>>
>>>> i think the current X86 disassembler is quite broken and fails badly on
>>>> handling REX for x86_64 code.
>>>>
>>>> below are some examples:
>>>>
>>>> $ echo "0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble
>>>> -triple=x86_64
>>>>     .text
>>>>     por    %mm3, %mm0
>>>>
>>>> $ echo "0x40,0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble
>>>> -triple=x86_64
>>>>     .text
>>>>     por    %mm3, %mm0
>>>>
>>>> $ echo "0x41,0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble
>>>> -triple=x86_64
>>>>     .text
>>>> <stdin>:1:1: warning: invalid instruction encoding
>>>> 0x41,0x0f,0xeb,0xc3
>>>> ^
>>>>
>>>>
>>>> the last example should also return "por %mm3, %mm0", but it fails to
>>>> understand the input.
>>>>
>>>> the reason stays with this line in X86DisassemblerDecoder.cpp:
>>>>
>>>>     rm  |= bFromREX(insn->rexPrefix) << 3;
>>>>
>>>> we can see that we take into account REX.B, but for "por" (0F EB), this
>>>> should be ignored.
>>>>
>>>> there are quite a lot of other instructions taking into account REX
>>>> like this, while according to the manual, REX should be ignored.
>>>>
>>>> i dont see any clean solution for this issue without some significant
>>>> changes into the way we decode ModRM & providing more information to .td
>>>> files.
>>>>
>>>> any idea?
>>>>
>>>> thanks.
>>>> Jun
>>>>
>>>
>>>
>>> --
>>> ~Craig
>>>
>>
>>
>
> --
> ~Craig
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141224/485b00a6/attachment.html>


More information about the llvm-dev mailing list