[flang] [llvm] [lldb] [clang-tools-extra] [libc] [clang] [compiler-rt] [libcxx] ✨ [Sema, Lex, Parse] Preprocessor embed in C and C++ (and Obj-C and Obj-C++ by-proxy) (PR #68620)

Mon Nov 27 14:22:39 PST 2023

AaronBallman wrote:

> I'm somewhat concerned about the default for `-E` being to explode `#embed` into the comma-separated raw integers. Even with moderately-sized embeds, I think it'll generate unusably-bloated output. The human-readability of a big list of integers is not better than embedded base64 -- and actually, seems more of a pain to decode.
> 
> I think the most user-friendly behavior would be:
> 
>     * `-E`: convert `#embed "file"` into `#embed_base64 "base64-file-contents"`. This preserves the property of the output not depending on other files, but doesn't make it hugely-bloated.
> 
>     * `-E -dE`: preserve any `#embed` directive as-is, referring to the external file.
> 
>     * Potentially another `-d?` mode to explode `#embed` into the raw integers (like you had proposed for the default behavior) -- though I'm not sure that's really going to be useful.

I agree with you that the exploded list of integers will potentially add a lot of content to the preprocessed output. However, the behavior of `-E` has always been to produce a fully preprocessed representation of the original source; your design breaks that. A significant use case for `-E` is as a way to debug surprising preprocessor behavior. So it depends on *why* you're using `-E` as to whether the contents are salient or not. Perhaps your approach is still reasonable, but I would personally find it to be a surprising change from the usual expectations.

In some situations, I suspect the preprocessed data may not be crucial to explode out. e.g., as a braced initializer for an array. But there are other situations where the base64 data is not useful. e.g., as an initializer for a structure, as a list of arguments to a function, etc. I anticipate use to predominately be in an initializer lists for an array but we have no actual usage experience with the feature either. That's why I lean towards "do the most expected thing by default" which would be to emit the actual list of integers even if it's large.

https://github.com/llvm/llvm-project/pull/68620