[clang] [clang] Inject tokens containing #embed back into token stream (PR #97274)
Jakub JelĂnek via cfe-commits
cfe-commits at lists.llvm.org
Tue Jul 16 02:14:27 PDT 2024
jakubjelinek wrote:
@ThePhd @AaronBallman
And even more importantly (checking on godbolt again):
```c
int a = sizeof (
#embed __FILE__ limit (1)
);
```
is 1 in clang as well as clang++ trunk and 4 with clang/clang++ trunk with -save-temps.
I thought there was agreement that at least for C the literals have type int, for C++ it is fuzzy and depends on what will be voted in but in any case, there shouldn't be different code produced between integrated preprocessing and -save-temps.
If C++ requires some cast, it would need to make it clear what that exact cast is, i.e. whether it is supposed to be
```c
12,143,12,16
```
or
```c
static_cast<unsigned char>(12),static_cast<unsigned char>(143),static_cast<unsigned char>(12),static_cast<unsigned char>(16)
```
or
```c
(unsigned char)12,(unsigned char)143,(unsigned char)12,(unsigned char)16
```
or
```c
'\014','\0217','\014','\020'
```
or whatever else and the preprocessor would need to emit that cast on every literal (unless using some extension like #embed "." __gnu__::__base64__("...") that the GCC patchset uses for the inner parts of the longer sequences.
I think the exact type of cast can affect parsing of some of the expressions, e.g.
```c
extern long long p[];
auto foo () {
return
#embed __FILE__ limit (1)
[p];
}
```
will behave one way with the cast is (unsigned char)12 and differently if it is static_cast<unsigned char>(12) or ((unsigned char)12).
Most of the other bugs I'm seeing are also about consistency, with -save-temps it works IMHO correctly while without it misbehaves. The behavior has to be the same.
https://github.com/llvm/llvm-project/pull/97274
More information about the cfe-commits
mailing list