[PATCH] D141215: [clang-repl] Introduce Value to capture expression results

Vassil Vassilev via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Sun Apr 30 02:11:56 PDT 2023


v.g.vassilev added a comment.

Thanks again for your efforts pushing this patch. It has gone a long way and I believe we are getting there.



================
Comment at: clang/include/clang/Interpreter/Value.h:46
+
+#define REPL_BUILTIN_TYPES                                                     \
+  X(bool, Bool)                                                                \
----------------
aaron.ballman wrote:
> v.g.vassilev wrote:
> > aaron.ballman wrote:
> > > Is this expected to be a complete list of builtin types? e.g., should this have `char8_t` and `void` and `wchar_t`, etc? Should this be including `clang/include/clang/AST/BuiltinTypes.def` instead of manually maintaining the list?
> > This is used for various things including storing the bits into a big-endian agnostic way. For `void` we have a special case in the Value and we cannot define a union of a `void` type. We can include the others that you suggest. All the relevant ones are described in `clang::BuiltinType::getName`.
> > 
> > We cannot use `BuiltinTypes.def` because we want to have a mapping between the type as written in the language (eg, `bool`, `unsigned`, etc) and its underlying type name. That mapping is not available in `BuiltinTypes.def`. Ideally we should extend `BuiltinTypes.def` somehow but I'd prefer outside of this patch. One of the challenges is that some of the types depend on the language options (eg. `_Bool` vs `bool`) and I am not sure this can be resolved by tablegen.
> > 
> > On a broader perspective, the `Value` class is responsible for two things: (a) get a value from the interpreter to compiled code (see test case); (b) set a value from compiled code to the interpreter. The second case is not yet covered (I can open soon another patch). The major use-case is calling at runtime functions and taking input parameters from compiled code.
> > 
> > FWIW, we should probably move all of these entities in a separate namespace. I'd suggest `caas` (compiler-as-a-service) and possibly rename the `Value` to `InterpreterValue` since `Value` is very generic and there are already a couple of classes with that name in llvm and clang. 
> > We can include the others that you suggest. All the relevant ones are described in clang::BuiltinType::getName.
> 
> Okay, so the plan is to handle all the builtin types (`_BitInt`, `_Complex`, various floating point formats, etc)? Will that be part of this patch or in follow-up work? (My intuition is that we should consider it up front because some of the builtin types are interesting -- like `_BitInt` because it's parameterized, which makes it novel compared to the other types.)
> 
> > We cannot use BuiltinTypes.def because we want to have a mapping between the type as written in the language (eg, bool, unsigned, etc) and its underlying type name. That mapping is not available in BuiltinTypes.def. Ideally we should extend BuiltinTypes.def somehow but I'd prefer outside of this patch. One of the challenges is that some of the types depend on the language options (eg. _Bool vs bool) and I am not sure this can be resolved by tablegen.
> 
> Thanks for the explanation! BuiltinTypes.def works well enough for times when we want to use macros and include the file to generate switch cases and the likes, but you're right that it's not well-suited for this. One thing to consider is whether we should change `BuiltinTypes.def` to be `BuiltinTypes.td` instead and use tablegen to generate the macro/include dance form as well as other output (such as for your needs, that can then consider language options or more complex predicates).
> 
> > FWIW, we should probably move all of these entities in a separate namespace. I'd suggest caas (compiler-as-a-service) and possibly rename the Value to InterpreterValue since Value is very generic and there are already a couple of classes with that name in llvm and clang.
> 
> I'm not in love with the name `caas` because that's not really a common acronym or abbreviation (and it looks like a typo due to `aa`). However, we already have an `interp` namespace in Clang for one of the other interpreters (constant expression evaluation), so that's not available for use. How about `repl` though?
> 
> As for considering changing the name from `Value` because of how many other `Value` types we have already... that's both a reason to rename and reason not to rename. I think I'm fine leaving it as `Value` so long as it's in a novel namespace.
> > We can include the others that you suggest. All the relevant ones are described in clang::BuiltinType::getName.
> 
> Okay, so the plan is to handle all the builtin types (`_BitInt`, `_Complex`, various floating point formats, etc)? Will that be part of this patch or in follow-up work? (My intuition is that we should consider it up front because some of the builtin types are interesting -- like `_BitInt` because it's parameterized, which makes it novel compared to the other types.)

That's a good point. I think the current patch offers a new capability and it is probably fine to not address all at once. My concern is that @junaire has a month to work on these things and this patch is the first out of two patches. The risk here is to drop the ball on that altogether if we add more work as a requirement for that patch to go in.


> 
> > We cannot use BuiltinTypes.def because we want to have a mapping between the type as written in the language (eg, bool, unsigned, etc) and its underlying type name. That mapping is not available in BuiltinTypes.def. Ideally we should extend BuiltinTypes.def somehow but I'd prefer outside of this patch. One of the challenges is that some of the types depend on the language options (eg. _Bool vs bool) and I am not sure this can be resolved by tablegen.
> 
> Thanks for the explanation! BuiltinTypes.def works well enough for times when we want to use macros and include the file to generate switch cases and the likes, but you're right that it's not well-suited for this. One thing to consider is whether we should change `BuiltinTypes.def` to be `BuiltinTypes.td` instead and use tablegen to generate the macro/include dance form as well as other output (such as for your needs, that can then consider language options or more complex predicates).

I am totally with you here. I am not sure how to do this since as of now some of the types and their form as written are decided with a flag (IIRC the various `char` flavors).

> 
> > FWIW, we should probably move all of these entities in a separate namespace. I'd suggest caas (compiler-as-a-service) and possibly rename the Value to InterpreterValue since Value is very generic and there are already a couple of classes with that name in llvm and clang.
> 
> I'm not in love with the name `caas` because that's not really a common acronym or abbreviation (and it looks like a typo due to `aa`). However, we already have an `interp` namespace in Clang for one of the other interpreters (constant expression evaluation), so that's not available for use. How about `repl` though?

The only problem I have with `repl` is misleading. `repl` usually means to people something that you run at your prompt and can type some expressions in and see some results. However, here we are building a foundational primitive to allow embedding an interpreter as part of your static program and be able to cross the compiler/interpreter boundary. The compiler-as-a-service (caas) is probably the closest I could find in CS research terminology to describe that. Would adding some solid bits of documentation to the entire approach make you more comfortable with that naming? 

FWIW, the namespace comment should not be a requirement for this particular patch. I think it is totally fine to keep it as it is and do the change outside of this patch to help the review. Moreover, we will probably need to move some of the existing components already.

> 
> As for considering changing the name from `Value` because of how many other `Value` types we have already... that's both a reason to rename and reason not to rename. I think I'm fine leaving it as `Value` so long as it's in a novel namespace.

Ok, if that's not confusing, then let's keep it the way it was. That way it should be easier to adopt that downstream for sure!




================
Comment at: clang/include/clang/Interpreter/Value.h:160-162
+  // Interpreter, QualType are stored as void* to reduce dependencies.
+  void *Interp = nullptr;
+  void *OpaqueType = nullptr;
----------------
aaron.ballman wrote:
> v.g.vassilev wrote:
> > aaron.ballman wrote:
> > > Why don't forward declares suffice if we're storing the information by pointer?
> > This is a performance-critical class. We literally measure the instruction count for it. We practically cannot include anything in this header file because the class needs to included as part of the interpreter runtime. For example:
> > 
> > ```
> > #include <clang/Interpreter/Value.h>
> > Value ResultV;
> > gInterpreter->evaluate("float i = 12.3; i++", &V);
> > printf("Value is %d\n", ResultV.castAs<int>());
> > ```
> > 
> > This is how you can do things in Cling. This not yet there but that's our next step.
> > 
> > For performance reasons we have the `ValueKind` optimization which allows us to perform most of the operations we need very fast. There are some operations such as printing the concrete type which need the actual `QualType` and so on but they are outside of the performance critical paths and it is okay to resort back to the real types providing the level of accuracy we need.
> > 
> That sounds like it's going to lead to maintenance problems in the long term, right? I can't think of another header file we say "don't touch this because it may impact runtime performance", and so I can easily imagine someone breaking your expectation that this file can't include anything else.
> 
> Is there a long-term plan for addressing this?
We have a few components like the Lexer that are extremely prone to performance regressions.

In terms for a longer-term plan in addressing this there are some steps could help IMO. First, this component is relatively standalone and very few changes will be required over time, for these I am hoping to be listed as a reviewer. Second, we can add a comment in the include area, making a note that including anything here will degrade the performance of almost all interpreted code. Third, we will find out about this in our downstream use-cases as the things get significantly slower.

Neither is silver bullet but that's probably the best we could do at that time. Btw, we might be able to add a test that's part of LLVM's performance analysis infrastructure.


================
Comment at: clang/lib/Interpreter/Interpreter.cpp:211
+    void __clang_Interpreter_SetValueNoAlloc(void*,void*,void*,unsigned long long);
+    template <class T, class = T (*)() /*disable for arrays*/>
+    void __clang_Interpreter_SetValueCopyArr(T* Src, void* Placement, unsigned long Size) {
----------------
Can you add an `#ifdef __cplusplus` and add a value printing tests that run in clang-repl in C mode?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141215/new/

https://reviews.llvm.org/D141215



More information about the cfe-commits mailing list