[llvm-commits] ARM ELF disassembly with integrated-as

Jim Grosbach grosbach at apple.com
Thu Nov 29 12:36:40 PST 2012


On Nov 29, 2012, at 12:31 PM, Tim Northover <t.p.northover at gmail.com> wrote:

> Hi Jim,
> 
>> The assembler shouldn't be inferring anything about data regions.
>> It should be following the directives given it (that's why the new
>> .data_region directives are there).
> 
> That's not actually what happens in ELF land (perhaps unfortunately).
> Assemblers use the traditional data inserters to deduce where regions
> begin and end, and there are no explicit data directives (depending on
> how you classify .word etc).

That's extremely troubling. I am pretty strongly opposed to doing that in LLVM unless we absolutely have to. The assembler shouldn't try to be that smart.

I was under the impression that for ELF, the assembler source code would have explicit $a, $t and $d labels that specify the regions. That's not the case?

> 
>>> As well as ARMAsmParser.cpp, I think parts of
>>> lib/MC/MCParser/AsmParser.cpp will need to know about the regions
>>> since they handle directives like .byte, .ascii, …
>> 
>> What am I missing? No changes should be necessary to any of these.
>> The directives (or magic $t/$a/$d symbols in your case) control everything.
> 
> The directives may control everything, but when some of those
> directives are handled in AsmParser.cpp...
> 
> I wrote my part assuming the assembler would have to explicitly drive
> the streamer to do that. Greg's patch today suggests that a
> sufficiently intelligent Streamer might be able to do the job instead.
> I think it would be a system completely independent of the DataRegion
> code used by MachO though; I can't quite see how to make the two play
> nicely together.
> 
> I'm also a little concerned about some equivalent of armasm's "DCI"
> cropping up later; DCI inserts a given hex value, to be interpreted as
> an instruction rather than data. If the ARMELFStreamer always took
> EmitValue as a data, things could get complicated.

That's an excellent example of why I don't like the assembler trying to be smart about this stuff. It can't ever tell the difference between a .long in the code stream that's a manually encoded instruction and a .long that's a data payload.

-Jim



More information about the llvm-commits mailing list