[LLVMdev] [PATCH] fix outs/ins of MOV16mr instruction (X86)

Gordon Keiser gkeiser at arxan.com
Wed Mar 25 05:00:05 PDT 2015

<side rant>
AT&T syntax on x86:  Ruining the eyes of everyone who read the Intel manuals and wrote in the standard syntax since...  I'm too old to remember the year.   I know LLVM has to be compatible with lots of GNU tools still, but I feel like now that clang has integrated AS it might do the world a favor and default to Intel syntax on input / output (especially since the internal assembler is mature now), while still accepting AT&T for compatibility reasons.  I'm not saying ditch it altogether, but when you have to read an Intel manual to learn the instruction format then reverse the order of everything, consult a GAS manual, and insert bizarre tokens all over the place...    I know it's Unix legacy and all, and there is probably more code than I've seen in my lifetime, but at the same time we aren't using AT&T syntax to write inline assembly for ARM, we're using ARM syntax.  
</side rant>

The segment register thing is odd as a concern on modern x86_64 as I understand it.  I see this patch seems to be targeting small-width moves in 64 bit mode.   AFAIK all segment registers are zeroed... with GS / FS being special of course, I believe already modeled as such...  and protection info pulled from the GDT.  But *should* they be modelled?   

<sidetracking again>
I bring this up because I'm wondering if LLVM models real mode correctly where segment registers and memory spaces are actually a concern (I haven't been paying attention to x86 backends much, more ARM and PPC).  I assume override prefixes can handle most of this, but there were lots of implicit segments for some instructions and I'm not sure on the state of modelling that.   

I saw some recent work towards getting real mode x86 working and would like to contribute if for nothing other than nostalgia's sake.   Being at ring 0 all the time and having full system access was awful from any security standpoint, but allowed for some fun and weird programming as well.   MS style inline assembly is close enough to Borland's old format that we could get old Turbo-C code compiling without a huge amount of effort beyond what has already been done, although the MS still needs a little work and I have a bug or two to file against it.

Sorry for derailing this thread, if anybody is interested in discussing off-list or starting a new thread on this topic feel free to email.  

Gordon Keiser
Software Development Engineer
Arxan Technologies

-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Tim Northover
Sent: Tuesday, March 24, 2015 10:28 PM
To: Jun Koi
Cc: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] [PATCH] fix outs/ins of MOV16mr instruction (X86)

>> Why?  i16mem here stands for the pointer, not the actual memory.  A 
>> store doesn't define a pointer, so why would it be in "outs"?
> Then why does this "i16mem:$dst" belongs to "ins"? Is that wrong, correct?

Think about a typical instruction:

    movw %ax, 1234(%rax, 4, %rbx)

The components making up the "i16mem" operand are "1234", %rax, 4 and %rbx (+ a hidden segment register). None of those are defined/written by the instruction (but they are read). So they need to be ins.

What's written is some nebulous memory location specified by those operands. This isn't modelled by LLVM's basic instructions at all (except possibly via mayLoad and mayStore).

The existing code is correct.


LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu

More information about the llvm-dev mailing list