[PATCH] D91460: [AsmParser] make .ascii/.asciz/.string support multiple strings

Mon Nov 16 21:18:00 PST 2020

MaskRay added a comment.

In D91460#2398228 <https://reviews.llvm.org/D91460#2398228>, @jrtc27 wrote:

>> In D91460#2395522 <https://reviews.llvm.org/D91460#2395522>, @jrtc27 wrote:
>>
>>> In which case that's confusing and likely to lead to bugs if people make use of the preprocessor in the hopes of concatenating strings but end up with NUL bytes being inserted contrary to what there would be in C. Can we not just fix the assembly to use commas rather than this weird syntax that seems to be a special case for `.asciz`? You can't write `.word 2 2`, only `.word 2, 2`, so why do string directives really need special treatment?
>>
>> I disagree.  That's why `.ascii` is distinct from `.asciz`; if developers do not want `NUL`-terminated C style strings, they should use `.ascii` and not `.asciz`.  The assembler need not match the behavior of the C preprocessor; this patch is about matching the behavior of GNU `as` such that `clang` can be used as a substitute.  Not matching the behavior of GNU `as` precisely here would be a mistake that would hinder the adoption of clang for existing assembler sources.
>
> Well, no, it is confusing for someone who doesn't know about the GNU syntax. If I see `.asciz "foo" "bar"` I'd assume it's equivalent to `.asciz "foobar"` not `.asciz "foo"; .asciz "bar"` and that `.asciz` is being used so there's a NUL on the end of the concatenated string.
>
> FreeBSD has `.asciz MACHINE_ARCH` in one of its arm csu files (for an ELF note). Some architectures like to build up MACHINE_ARCH (defined in a header file) by concatenating multiple strings together for the different components. Currently arm doesn't do this (and only has two variants), but you can imagine other architectures might also want an ELF note and so would give confusing behaviour (you may not find it confusing, but I can guarantee you a large fraction of people would); if you want the C-like behaviour you have to use `.ascii` and add the NUL byte manually, as is done in your example.
>
> But that's all justification for why I don't like GNU as's behaviour and why I don't like adding that feature to LLVM (I would not advocate for _different_ behaviour as that would cause even more problems, only to just not implement it at all). However, your example is a real-world case of something that really does need this feature and cannot easily be rewritten to use commas, so I see this as unavoidable and so yes, LLVM should implement this GNU as feature.

I agree with @jrtc27 that juxtaposition with `.asciz` has a bug-prone behavior. This is unrelated to GNU as's preprocessing stage `do_scrub_chars`. The Linux kernel does not need `.asciz`, so if we cannot get a reasonable behavior for `.asciz`, we can simply not allow the usage.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91460/new/

https://reviews.llvm.org/D91460