[PATCH] D136400: [llvm-ocaml] Migrate from naked pointers to prepare for OCaml 5

Sat Feb 18 10:27:43 PST 2023

alan added inline comments.

================
Comment at: llvm/bindings/ocaml/llvm/llvm_ocaml.c:13-15
-|* Note that these functions intentionally take liberties with the CAMLparamX *|
-|* macros, since most of the parameters are not GC heap objects.              *|
-|*                                                                            *|
----------------
jberdine wrote:
> A high-level question about this diff is whether you really want to change the bindings in this way. The current situation is that they intentionally take a low-level view of the FFI and have been written with that in mind and reviewed from that perspective. There is probably more chance of introducing bugs by changing this than by just adding the encoding of naked pointers and retaining the use of the low-level FFI. My personal preference would be to keep the existing low level FFI. But if such a change from the low-level to the high-level FFI is done, it should be done as a separate diff that does only that.
I thought about this a long time, even before you left this comment. The OCaml manual instructs you to use the `CAMLparam`, `CAMLlocal`, and `CAMLreturn` macros for all FFI functions (https://v2.ocaml.org/manual/intfc.html#ss:c-simple-gc-harmony). It does not give any nuances, like "You don't need to register unboxed integers with `CAMLparam`" or "You don't need to use the macros if you never call the OCaml runtime." When I first wrote this patch, I just added all the macros to be extra safe.

I've been going through the code locally now and removing the macros, and so far the tests still pass, but there are several spots where I was unsure if removing the macro would be unsafe. (Example: Where one of the arguments is a pointer into the OCaml-managed heap, such as an OCaml string or an `Abstract_tag`ged LLVM value for purposes of custom finalizer, and the function calls the OCaml runtime by allocating something.) I don't want a heisenbug where the code usually passes the tests, but mysteriously segfaults in some user's program if the GC kicks in at the wrong time and deallocates memory that shouldn't, because the binding code plays fast and loose with the macros.

Then, we have the issue with the diagnostic handler, meaning that the OCaml runtime could kick in at many points that may not be obvious, but the diagnostic handler could violate so many assumptions about C and C++ resource cleanup, as we've discussed, that I don't know if it should factor in to the decision whether to use the macros.

When I was thinking about whether to use the macros or drop them, my view was mainly about efficiency versus readability, where I viewed the macros as sacrificing slight efficiency (because the code spends instructions registering values with the GC, when this may not be necessary) for readability (because the code follows the rules of the OCaml FFI manual to the letter). I do not share your view that using the macros would *introduce* bugs or make the code harder to review; I always thought the macros make the code safer and easier to review. It's better to just use the macro than to think, "Do we need to use a macro here? Does the GC need to know about this value before we call `caml_alloc_whatever`?"

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136400/new/

https://reviews.llvm.org/D136400