[LLVMdev] 16-bit x86 status update

Tue Jan 14 12:00:14 PST 2014

Absolutely fantastic work, David. Thank you!

On Jan 14, 2014, at 4:35 AM, David Woodhouse <dwmw2 at infradead.org> wrote:

> Here's a brief update on the current state of 16-bit x86 support...
> 
> The assembler has support for the .code16 directive and can happily
> output 16-bit code. In pending patches¹ I have also added an
> i386-*-*-code16 triple and fixed the disassembler to support 16-bit mode
> (which was previously present, but could not be invoked and was fairly
> broken). And added a '-m16' option to clang.
> 
> The main caveats to bear in mind for 16-bit code which was previously
> built with gcc/gas are:
> 
> • We do not support the explicit 'data32' and 'addr32' prefixes in asm.
> 
>   The data32 prefix is almost never needed. If you use the correct
>   suffix on an instruction (retl vs. retw, for example), then you
>   should never need to use 'data32'.
> 
>   The addr32 prefix *is* needed by GNU binutils, because *even* when
>   given an explicit address which is greater than 64KiB, it'll just
>   give you a warning about truncation, and emit the instruction with
>   a 16-bit addressing mode and the wrong address. LLVM doesn't do that,
>   and is far happier to just use 32-bit addressing whenever it *might*
>   need to. This means that we never really need an explicit addr32
>   prefix to use 32-bit addressing in 16-bit mode. And also that our
>   code tends to be larger.
> 
> • We do not support '.code16gcc'. This is a hack which emits code in
>   16-bit mode but parses the input as if it's in 32-bit mode. So
>   instructions which are ambiguous about their operand size will take
>   their 32-bit form — a plain 'ret' will cause it to emit 'retl', etc.
>   We *could* support this mode, but it would be moderately non-trivial.
>   It would require the code emitter and the asm parser to maintain
>   separate ideas of the mode. The fix for PR18303 makes that somewhat
>   simpler, but still not entirely trivial. Alternatively we could add
>   yet another mode bit for the *parser*, but I don't like that much.
> 
> • GCC allows the compilation of C code to 16-bit mode by using
>   asm(".code16gcc") and also adding other flags such as
>   -fno-unit-at-a-time to ensure that the .code16gcc really *is* the
>   first thing the assembler sees. We don't support that horridness,
>   and don't need it since clang can support '-m16'. We have also filed
>   http://gcc.gnu.org/PR59672 to request the same in GCC.
> 
> I have been able to build the 16-bit startup code of the Linux kernel
> with .code16 and 'clang -m16', and it works fine. I had to fix PR18303,
> for which David Peixotto is working on a better fix, and I had to work
> around PR3997 — which some people seem to be denying is a bug in the
> first place, and claiming (wrongly) that GCC doesn't get it right
> either. But both of those are pre-existing bugs, and Not My Fault™.
> 
> At this point, I'm not aware of any issues specifically with 16-bit
> mode, other than the above. If anyone else wants to start testing it in
> anger on real code, that would be useful...
> 
> -- 
> David Woodhouse                            Open Source Technology Centre
> David.Woodhouse at intel.com                              Intel Corporation
> 
> ¹ http://git.infradead.org/users/dwmw2/llvm.git/summary/80bd3d9f and
>  http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140113/201303.html
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev