[PATCH] D136400: [llvm-ocaml] Migrate from naked pointers to prepare for OCaml 5

Sat Feb 18 15:21:02 PST 2023

alan added inline comments.

================
Comment at: llvm/bindings/ocaml/llvm/llvm_ocaml.c:13-15
-|* Note that these functions intentionally take liberties with the CAMLparamX *|
-|* macros, since most of the parameters are not GC heap objects.              *|
-|*                                                                            *|
----------------
jberdine wrote:
> alan wrote:
> > jberdine wrote:
> > > A high-level question about this diff is whether you really want to change the bindings in this way. The current situation is that they intentionally take a low-level view of the FFI and have been written with that in mind and reviewed from that perspective. There is probably more chance of introducing bugs by changing this than by just adding the encoding of naked pointers and retaining the use of the low-level FFI. My personal preference would be to keep the existing low level FFI. But if such a change from the low-level to the high-level FFI is done, it should be done as a separate diff that does only that.
> > I thought about this a long time, even before you left this comment. The OCaml manual instructs you to use the `CAMLparam`, `CAMLlocal`, and `CAMLreturn` macros for all FFI functions (https://v2.ocaml.org/manual/intfc.html#ss:c-simple-gc-harmony). It does not give any nuances, like "You don't need to register unboxed integers with `CAMLparam`" or "You don't need to use the macros if you never call the OCaml runtime." When I first wrote this patch, I just added all the macros to be extra safe.
> > 
> > I've been going through the code locally now and removing the macros, and so far the tests still pass, but there are several spots where I was unsure if removing the macro would be unsafe. (Example: Where one of the arguments is a pointer into the OCaml-managed heap, such as an OCaml string or an `Abstract_tag`ged LLVM value for purposes of custom finalizer, and the function calls the OCaml runtime by allocating something.) I don't want a heisenbug where the code usually passes the tests, but mysteriously segfaults in some user's program if the GC kicks in at the wrong time and deallocates memory that shouldn't, because the binding code plays fast and loose with the macros.
> > 
> > Then, we have the issue with the diagnostic handler, meaning that the OCaml runtime could kick in at many points that may not be obvious, but the diagnostic handler could violate so many assumptions about C and C++ resource cleanup, as we've discussed, that I don't know if it should factor in to the decision whether to use the macros.
> > 
> > When I was thinking about whether to use the macros or drop them, my view was mainly about efficiency versus readability, where I viewed the macros as sacrificing slight efficiency (because the code spends instructions registering values with the GC, when this may not be necessary) for readability (because the code follows the rules of the OCaml FFI manual to the letter). I do not share your view that using the macros would *introduce* bugs or make the code harder to review; I always thought the macros make the code safer and easier to review. It's better to just use the macro than to think, "Do we need to use a macro here? Does the GC need to know about this value before we call `caml_alloc_whatever`?"
> My main point is not so much about whether or not to most desirable final situation involves abiding by the simplified FFI rules, but that the changes to do so are large and orthogonal to the goal of changing the bindings to avoid naked pointers. Changing to the simplified FFI rules is a judgement call that has pros and cons, while eliminating naked pointers is a hard requirement going forward. That is why I think the two things should be done as two separate diffs.
> 
> As for potentially introducing bugs, maybe I'm just paranoid, but large changes are hard to review and not as precise when bisecting issues that are discovered later. More specifically, adding `CAMLlocal` and associated macros publish more locations as GC roots, and the associated changes involve more allocations. Going all the way to strictly following the simplified rules should be fine, but if there are any oversights and we only get part of the way there, it seems entirely possible to introduce bugs.
> 
> The diagnostic handler API is limited in the sense that the handler functions must not access the OCaml runtime system. But following the simplified FFI rules will not be enough to enable supporting arbitrary handler functions. That would be another orthogonal change.
I can respect the view that a smaller diff is better and that we can add the macros in a separate patch and focus this one on making the code work on OCaml 5. I've removed most of the `CAML` macros I added. There were some functions that took a pointer into the OCaml heap (e.g. an OCaml string) and called the OCaml runtime (e.g. to allocate a block or throw an exception), and I kept the macros in those cases because I think that not using the macros would be incorrect. Hopefully the diff should be smaller and easier to review now.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136400/new/

https://reviews.llvm.org/D136400