[LLVMdev] REX prefix is not handled properly for X86_64?
ahmed.bougacha at gmail.com
Thu Dec 11 11:12:17 PST 2014
FWIW, I think LLVM's right in rejecting this. Per SDM 2.2.1, "Only one
REX prefix is allowed per instruction."
Here, 0x43 and 0x40 are both REX prefixes, so that contradicts the manual.
However, trunk llvm-mc is still able to disassemble the add, I guess
because it ignores invalid bytes:
<stdin>:1:1: warning: invalid instruction encoding
0x43 0x40 0x04 0x75
addb $117, %al ## encoding: [0x04,0x75]
## <MCInst #107 ADD8i8
## <MCOperand Imm:117>>
It would be trivial to change the disassembler to accept redundant REX
prefixes (see attached patch, turn that into a loop to accept more
than 2, but that would be even worse). Then, you have to decide which
one to use: the first, or the last. Currently, only the last REX
prefix is the one that's actually used for the following instruction:
all the others before are discarded as invalid encodings.
Now, if LLVM rejected useless REX prefixes (e.g. "40 04 75") that
would be a problem, but that seems to work fine without any change.
So, to recap: to avoid the problem, I think you should change the way
you use the LLVM Disassembler API. When it's unable to disassemble a
byte, ignore it and try again at the next one. That's what most
linear disassemblers do, and would correctly ignore the first REX
On Thu, Dec 11, 2014 at 1:27 AM, Jun Koi <junkoi2004 at gmail.com> wrote:
> Intel's Xed can interpret "43 40 04 75" as "add al, 0x75", but LLVM's X86
> disassembler considers this invalid code. I guess the reason is that LLVM
> fails to recognize the REX prefix in this case.
> Is this correct?
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 595 bytes
Desc: not available
More information about the llvm-dev