[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)

Jakub JelĂ­nek via cfe-commits cfe-commits at lists.llvm.org
Wed Jun 19 08:58:52 PDT 2024


================
@@ -0,0 +1,98 @@
+// RUN: %clang_cc1 %s -fsyntax-only --embed-dir=%S/Inputs -verify=expected,cxx -Wno-c23-extensions
+// RUN: %clang_cc1 -x c -std=c23 %s -fsyntax-only --embed-dir=%S/Inputs -verify=expected,c
+#embed <media/empty>
+;
+
+void f (unsigned char x) { (void)x;}
+void g () {}
+void h (unsigned char x, int y) {(void)x; (void)y;}
+int i () {
+	return
+#embed <single_byte.txt>
+		;
+}
+
+_Static_assert(
+#embed <single_byte.txt> suffix(,)
+""
+);
+_Static_assert(
+#embed <single_byte.txt>
+, ""
+);
+_Static_assert(sizeof(
+#embed <single_byte.txt>
+) ==
+sizeof(unsigned char)
+, ""
+);
+_Static_assert(sizeof
+#embed <single_byte.txt>
+, ""
+);
+_Static_assert(sizeof(
+#embed <jk.txt> // expected-warning {{left operand of comma operator has no effect}}
+) ==
+sizeof(unsigned char)
+, ""
+);
+
+#ifdef __cplusplus
+template <int First, int Second>
+void j() {
+	static_assert(First == 'j', "");
+	static_assert(Second == 'k', "");
+}
+#endif
+
+void do_stuff() {
+	f(
+#embed <single_byte.txt>
+	);
+	g(
+#embed <media/empty>
+	);
+	h(
+#embed <jk.txt>
+	);
+	int r = i();
+	(void)r;
+#ifdef __cplusplus
+	j<
+#embed <jk.txt>
+	>(
+#embed <media/empty>
+	);
+#endif
+}
+
+// Ensure that we don't accidentally allow you to initialize an unsigned char *
+// from embedded data; the data is modeled as a string literal internally, but
+// is not actually a string literal.
+const unsigned char *ptr =
+#embed <jk.txt> // expected-warning {{left operand of comma operator has no effect}}
+; // c-error at -2 {{incompatible integer to pointer conversion initializing 'const unsigned char *' with an expression of type 'unsigned char'}} \
+     cxx-error at -2 {{cannot initialize a variable of type 'const unsigned char *' with an rvalue of type 'unsigned char'}}
+
+// However, there are some cases where this is fine and should work.
+const unsigned char *null_ptr_1 =
+#embed <media/empty> if_empty(0)
+;
+
+const unsigned char *null_ptr_2 =
+#embed <null_byte.bin>
+;
+
+const unsigned char *null_ptr_3 = {
+#embed <null_byte.bin>
+};
+
+#define FILE_NAME <null_byte.bin>
+#define LIMIT 1
+#define OFFSET 0
+#define EMPTY_SUFFIX suffix()
+
+constexpr unsigned char ch =
+#embed FILE_NAME limit(LIMIT) clang::offset(OFFSET) EMPTY_SUFFIX
+;
+static_assert(ch == 0);
----------------
jakubjelinek wrote:

Unless you want to special case it in way too many spots, I'd think it would be far easier to optimize just the inner part of the integer sequence, i.e. everything except the first and last sequence element (maybe with the exception when the last prefix token is , or first suffix token is ,
Because one can use arbitrary tokens before and after the #embed, it can be
```c
const unsigned char a[] = {
-400 + 4 * 
#embed __FILE__
- 27 };
```
(or with tokens from prefix/suffix) and at least the current patchset mishandles many of such cases.  For the inner part of the sequence you know there is , before it and , after it, which simplifies a lot of things.
The above is handled correctly by GCC and by clang -save-temps, but not by clang without -save-temps.
And there are tons of other cases like that, e.g. even designated initializer [26] = 
before the sequence, etc.

https://github.com/llvm/llvm-project/pull/95802


More information about the cfe-commits mailing list