[PATCH] D141215: [clang-repl] Introduce Value to capture expression results
Aaron Ballman via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Fri Apr 28 08:25:30 PDT 2023
aaron.ballman added inline comments.
================
Comment at: clang/include/clang/Interpreter/Interpreter.h:96
+
+ size_t getEffectivePTUSize() const;
+
----------------
It looks like this can be private?
Also, just a note (not something you have to deal with in this review), the local naming convention seems to have decided that if the function starts with `get` then it's camel case and otherwise the name is pascal case.
================
Comment at: clang/include/clang/Interpreter/Value.h:104-108
+ void setKind(Kind K) { ValueKind = K; }
+ void setOpaqueType(void *Ty) { OpaqueType = Ty; }
+
+ void *getPtr() const;
+ void setPtr(void *Ptr) { Data.m_Ptr = Ptr; }
----------------
If we can't get type safety through construction, should we consider adding assertions here that a value's kind and opaque type agree, or that we're not changing the type or kind without also changing the pointer?
================
Comment at: clang/include/clang/Interpreter/Value.h:46
+
+#define REPL_BUILTIN_TYPES \
+ X(bool, Bool) \
----------------
v.g.vassilev wrote:
> aaron.ballman wrote:
> > Is this expected to be a complete list of builtin types? e.g., should this have `char8_t` and `void` and `wchar_t`, etc? Should this be including `clang/include/clang/AST/BuiltinTypes.def` instead of manually maintaining the list?
> This is used for various things including storing the bits into a big-endian agnostic way. For `void` we have a special case in the Value and we cannot define a union of a `void` type. We can include the others that you suggest. All the relevant ones are described in `clang::BuiltinType::getName`.
>
> We cannot use `BuiltinTypes.def` because we want to have a mapping between the type as written in the language (eg, `bool`, `unsigned`, etc) and its underlying type name. That mapping is not available in `BuiltinTypes.def`. Ideally we should extend `BuiltinTypes.def` somehow but I'd prefer outside of this patch. One of the challenges is that some of the types depend on the language options (eg. `_Bool` vs `bool`) and I am not sure this can be resolved by tablegen.
>
> On a broader perspective, the `Value` class is responsible for two things: (a) get a value from the interpreter to compiled code (see test case); (b) set a value from compiled code to the interpreter. The second case is not yet covered (I can open soon another patch). The major use-case is calling at runtime functions and taking input parameters from compiled code.
>
> FWIW, we should probably move all of these entities in a separate namespace. I'd suggest `caas` (compiler-as-a-service) and possibly rename the `Value` to `InterpreterValue` since `Value` is very generic and there are already a couple of classes with that name in llvm and clang.
> We can include the others that you suggest. All the relevant ones are described in clang::BuiltinType::getName.
Okay, so the plan is to handle all the builtin types (`_BitInt`, `_Complex`, various floating point formats, etc)? Will that be part of this patch or in follow-up work? (My intuition is that we should consider it up front because some of the builtin types are interesting -- like `_BitInt` because it's parameterized, which makes it novel compared to the other types.)
> We cannot use BuiltinTypes.def because we want to have a mapping between the type as written in the language (eg, bool, unsigned, etc) and its underlying type name. That mapping is not available in BuiltinTypes.def. Ideally we should extend BuiltinTypes.def somehow but I'd prefer outside of this patch. One of the challenges is that some of the types depend on the language options (eg. _Bool vs bool) and I am not sure this can be resolved by tablegen.
Thanks for the explanation! BuiltinTypes.def works well enough for times when we want to use macros and include the file to generate switch cases and the likes, but you're right that it's not well-suited for this. One thing to consider is whether we should change `BuiltinTypes.def` to be `BuiltinTypes.td` instead and use tablegen to generate the macro/include dance form as well as other output (such as for your needs, that can then consider language options or more complex predicates).
> FWIW, we should probably move all of these entities in a separate namespace. I'd suggest caas (compiler-as-a-service) and possibly rename the Value to InterpreterValue since Value is very generic and there are already a couple of classes with that name in llvm and clang.
I'm not in love with the name `caas` because that's not really a common acronym or abbreviation (and it looks like a typo due to `aa`). However, we already have an `interp` namespace in Clang for one of the other interpreters (constant expression evaluation), so that's not available for use. How about `repl` though?
As for considering changing the name from `Value` because of how many other `Value` types we have already... that's both a reason to rename and reason not to rename. I think I'm fine leaving it as `Value` so long as it's in a novel namespace.
================
Comment at: clang/include/clang/Interpreter/Value.h:77
+
+ K_Void,
+ K_PtrOrObj,
----------------
junaire wrote:
> aaron.ballman wrote:
> > Fixing indentation
> It's already formatted by clang-format and if I change this arc doesn't even allow me to upload the diff. I believe it's a bug in clang-format and I have filed https://github.com/llvm/llvm-project/issues/60535 So I'd like to fix this when I commit this :)
Totally fine to fix this on commit, thank you for filing the issue!
================
Comment at: clang/include/clang/Interpreter/Value.h:83
+ Value() = default;
+ Value(void /*Interpreter*/ *In, void /*QualType*/ *Ty);
+ Value(const Value &RHS);
----------------
v.g.vassilev wrote:
> junaire wrote:
> > aaron.ballman wrote:
> > > Why do these take `void *` instead of the expected type?
> > Yeah for the first parameter, we could just use `Interpreter*` but the second one is an opaque type so I think we should keep it?
> See my previous comments on performance. We cannot include anything bulky in the header file.
I think I understand why the design is the way it is, but it still makes me uneasy. The constructor takes a pointer to some bucket of bytes... no size information, no type information, etc. Just "here's a random pointer". And then later, we hope the user calls `setKind()` in a way that makes sense.
We need it to be fast, but we also need it to be correct -- the type system is the best tool for helping with that.
================
Comment at: clang/include/clang/Interpreter/Value.h:121
+ static T cast(const Value &V) {
+ if (V.isPointerOrObjectType())
+ return (T)(uintptr_t)V.getAs<void *>();
----------------
v.g.vassilev wrote:
> aaron.ballman wrote:
> > Do we have to worry about member function pointers where a single pointer value may not be sufficient?
> I am not sure if I understand your comment correctly. We only store objects and not function/member pointers (perhaps in some pathological way one could do that but I'd say it is not supported). Here we want to know if we stored it as a non-builtin type we need to make the pointer-like casts. My take is that for now we shouldn't worry about member function pointers.
Given:
```
struct Foo {
struct Base {
virtual int bar();
};
struct Foo : Base {
int bar() { return 12; }
};
int (Foo::*bar_ptr)() = &Foo::bar;
```
The object `bar_ptr` requires two pointers of space to represent: https://godbolt.org/z/o81sbjKM6, so I'm wondering whether `Value` can represent `bar_ptr`
================
Comment at: clang/include/clang/Interpreter/Value.h:143
+ /// Values referencing an object are treated as pointers to the object.
+ template <typename T> T castAs() const { return CastFwd<T>::cast(*this); }
+
----------------
v.g.vassilev wrote:
> junaire wrote:
> > aaron.ballman wrote:
> > > This doesn't match the usual pattern used in Clang where the `get` variant returns `nullptr` and the `cast` variant asserts if the cast would be invalid. Should we go with a similar approach?
> > These APIs are adopted in Cling and Cling will remove the old implementation and use the upstream version of the value printing feature at some point in the future. So @v.g.vassilev hope we can keep these APIs close to Cling's variants as much as we can. I don't have a strong opinion here, what's your take here? @v.g.vassilev
> I probably see the confusion. The `getAs<T>` operation is only relevant for pointer types. For example we stored object of type `UserClass` as a pointer. For instance, I don't see how `getAs<float>` can return a `nullptr`. We can probably inline the implementation of `getAs` for non-pointer types into `castAs` and leave `getAs` unimplemented relying only on the explicit specialization bellow for pointers.
>
> The `castAs` operation means take a valid bit representation and transforms it into the requested one. I don't think it can or should be ever invalid. Here we are reshuffling bits.
>
> These changes are very subtle and if we decide to make changes we should open a pull request against the ROOT project to make sure we are not breaking some important case. The reference implementation is here: https://github.com/root-project/root/blob/master/interpreter/cling/include/cling/Interpreter/Value.h
I think `castAs` can be invalid -- if you have the bits for a float and you try to cast it as a pointer, that's not going to lead to good things, right?
My point is largely that the names `getAs` and `castAs` have meaning within Clang already and this seems to assign slightly different meaning to them, which might trip folks up. It might be worth considering renaming them to something like `bitCast` and `pointerCast` (or something else, I'm not tied to these names)?
================
Comment at: clang/include/clang/Interpreter/Value.h:160-162
+ // Interpreter, QualType are stored as void* to reduce dependencies.
+ void *Interp = nullptr;
+ void *OpaqueType = nullptr;
----------------
v.g.vassilev wrote:
> aaron.ballman wrote:
> > Why don't forward declares suffice if we're storing the information by pointer?
> This is a performance-critical class. We literally measure the instruction count for it. We practically cannot include anything in this header file because the class needs to included as part of the interpreter runtime. For example:
>
> ```
> #include <clang/Interpreter/Value.h>
> Value ResultV;
> gInterpreter->evaluate("float i = 12.3; i++", &V);
> printf("Value is %d\n", ResultV.castAs<int>());
> ```
>
> This is how you can do things in Cling. This not yet there but that's our next step.
>
> For performance reasons we have the `ValueKind` optimization which allows us to perform most of the operations we need very fast. There are some operations such as printing the concrete type which need the actual `QualType` and so on but they are outside of the performance critical paths and it is okay to resort back to the real types providing the level of accuracy we need.
>
That sounds like it's going to lead to maintenance problems in the long term, right? I can't think of another header file we say "don't touch this because it may impact runtime performance", and so I can easily imagine someone breaking your expectation that this file can't include anything else.
Is there a long-term plan for addressing this?
================
Comment at: clang/lib/Interpreter/IncrementalParser.cpp:22
#include "clang/FrontendTool/Utils.h"
+#include "clang/Interpreter/Interpreter.h"
#include "clang/Parse/Parser.h"
----------------
v.g.vassilev wrote:
> This introduces a layering violation. Can we avoid it?
Does it? This file is in `clang/lib/Interpreter` so including other things from `clang/Interpreter` would not be a layering violation normally.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D141215/new/
https://reviews.llvm.org/D141215
More information about the cfe-commits
mailing list