[llvm-commits] ARM ELF disassembly with integrated-as
Jim Grosbach
grosbach at apple.com
Thu Nov 29 12:36:40 PST 2012
On Nov 29, 2012, at 12:31 PM, Tim Northover <t.p.northover at gmail.com> wrote:
> Hi Jim,
>
>> The assembler shouldn't be inferring anything about data regions.
>> It should be following the directives given it (that's why the new
>> .data_region directives are there).
>
> That's not actually what happens in ELF land (perhaps unfortunately).
> Assemblers use the traditional data inserters to deduce where regions
> begin and end, and there are no explicit data directives (depending on
> how you classify .word etc).
That's extremely troubling. I am pretty strongly opposed to doing that in LLVM unless we absolutely have to. The assembler shouldn't try to be that smart.
I was under the impression that for ELF, the assembler source code would have explicit $a, $t and $d labels that specify the regions. That's not the case?
>
>>> As well as ARMAsmParser.cpp, I think parts of
>>> lib/MC/MCParser/AsmParser.cpp will need to know about the regions
>>> since they handle directives like .byte, .ascii, …
>>
>> What am I missing? No changes should be necessary to any of these.
>> The directives (or magic $t/$a/$d symbols in your case) control everything.
>
> The directives may control everything, but when some of those
> directives are handled in AsmParser.cpp...
>
> I wrote my part assuming the assembler would have to explicitly drive
> the streamer to do that. Greg's patch today suggests that a
> sufficiently intelligent Streamer might be able to do the job instead.
> I think it would be a system completely independent of the DataRegion
> code used by MachO though; I can't quite see how to make the two play
> nicely together.
>
> I'm also a little concerned about some equivalent of armasm's "DCI"
> cropping up later; DCI inserts a given hex value, to be interpreted as
> an instruction rather than data. If the ARMELFStreamer always took
> EmitValue as a data, things could get complicated.
That's an excellent example of why I don't like the assembler trying to be smart about this stuff. It can't ever tell the difference between a .long in the code stream that's a manually encoded instruction and a .long that's a data payload.
-Jim
More information about the llvm-commits
mailing list