[llvm] [compiler-rt] [clang] [libcxx] [clang-tools-extra] [lldb] [libc] [flang] ✨ [Sema, Lex, Parse] Preprocessor embed in C and C++ (and Obj-C and Obj-C++ by-proxy) (PR #68620)

Aaron Ballman via llvm-commits llvm-commits at lists.llvm.org
Mon Nov 27 12:38:01 PST 2023


AaronBallman wrote:

> Your reasoning works until we have a crash that relies on `#embed` and/or its contents.

Agreed, that's the "downsides" I mentioned.

> From what I saw triaging old crashes, crash submitters are conscious if they work with proprietary code they can't share even a fragment of, and not so rarely reduce crash by themselves. I'm not fond of the idea on giving up on every embed-related crash, because there is a risk (which I'm not estimating high), that submitter forgot to check their otherwise open code for sensitive information. This doesn't help us ironing out bugs in `#embed` implementation in the long run.

I think folks are conscious of that because 1) it's been a known issue with header files for ages and 2) header files are textual prose that you can often read to see it's sensitive (e.g., a notice in a comment at the top of the file). I don't think either of those hold true for `#embed` though. Further, I think leaking source code is a different kind of issue than leaking credentials stored in a binary blob in some ways -- they're both leaks, but leaked header code isn't usually immediately exploitable by itself.

> One might say that additional back-and-forth with crash submitter is not too big of a deal, and it would be, if we haven't had ever-growing backlog of issues, some dating back more than a decade. Our existing workflow that allows people to drop attachments on us and forget about it has proven itself useful in the very long run. So I'd like us to keep this.

I agree that it would be nice to have, but I think we need to think very carefully about the behavior here. I think `#embed` usage will be quite a bit different from `#include` usage in the wild; I'd rather we err on the side of caution until we have a more clear understanding of usage patterns in the wild. (Again, I'm flexible here -- if we think my concerns aren't realistic ones, then that changes my opinions.)

https://github.com/llvm/llvm-project/pull/68620


More information about the llvm-commits mailing list