[PATCH] D136400: [llvm-ocaml] Migrate from naked pointers to prepare for OCaml 5

Alan via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sat Feb 18 10:06:49 PST 2023


alan added a comment.

Hi, I've went over your comments and answered them. Please let me know if you have questions or concerns still.



================
Comment at: llvm/bindings/ocaml/llvm/llvm_ocaml.c:45
+                       "treats them as integers");
+  return (void *)(v ^ 1);
 }
----------------
Kakadu wrote:
> Could you add a comment why these two functions are needed? I always though that out-of-heap values should be wrapped into Abstract_tag. At least the manual says that https://v2.ocaml.org/manual/intfc.html#ss:c-outside-head
> 
> Sorry for asking stupid questions, I was teleported there from https://discuss.ocaml.org/t/proposal-care-more-about-ocaml-bindings-for-popular-libraries/11451
When I first wrote this patch, my original approach was to wrap LLVM values in an OCaml block of Abstract_tag. In a discussion with @jberdine, he said that I could assume that all pointers from LLVM are 2-byte aligned, and set the low bit to 1 when exposing them to OCaml (https://github.com/llvm/llvm-project/issues/58134#issuecomment-1271544642). The OCaml GC distinguishes pointers from integers from the low bit; the low bit of integers is 1. Therefore, the OCaml GC will treat out-of-OCaml-heap LLVM pointers as integers.

The OCaml documentation that you link also recommends this representation:

> For pointers that are at least 2-aligned (the low bit is guaranteed to be zero), we have yet another valid representation as an OCaml tagged integer. 

This approach avoids an extra allocation. This design choice is documented in the comments and assert messages. Is there anything else I need to add to make it clear for people unfamiliar with the code?


================
Comment at: llvm/bindings/ocaml/llvm/llvm_ocaml.c:61
 
 value cstr_to_string(const char *Str, mlsize_t Len) {
   if (!Str)
----------------
Kakadu wrote:
> Am I understanding right... LLVM uses non-zero-terminated strings under the hood, so this function is need?
> 
> In case of zero terminated string it would be better to use caml_copy_string https://github.com/ocaml/ocaml/blob/5.0/runtime/alloc.c#L208
This code is what exists on the main branch and my patch does not touch it. The code passes the tests; changing this function is not the business of my patch.

It seems that this function accepts NULL pointers and handles the NULL case by allocating a zero-length OCaml-heap-allocated string.


================
Comment at: llvm/bindings/ocaml/llvm/llvm_ocaml.c:118
 
-static value alloc_variant(int tag, void *Value) {
+void *alloc_temp(value Elements) {
+  unsigned Length = Wosize_val(Elements);
----------------
Kakadu wrote:
> I'm not 100% follow the intent of this function. It looks like it tries to convert OCaml array of OCaml values to C array of OCaml values. But if it is true, I'm not sure that llvm_struct_element_types works as expected.
> 
> In I understood intent correctly, it should be renamed to array_of_ocaml_array or something.....
Yes, it converts an OCaml array to a C array. The code on the main branch uses the fact that pointers are naked to directly cast OCaml blocks to C arrays, which is no longer possible now that OCaml 5 disallows naked pointers.

The `x_of_y` naming style is an OCaml quirk. I don't want to name this helper function that.


================
Comment at: llvm/bindings/ocaml/llvm/llvm_ocaml.c:663
+  unsigned Length = LLVMCountStructElementTypes(Type_val(StructTy));
+  LLVMTypeRef *Temp = alloc_temp(Length);
+  LLVMGetStructElementTypes(Type_val(StructTy), Temp);
----------------
Kakadu wrote:
> Could you double check that there is no bug here? See my comment about alloc_temp. 
This code allocates a temporary, intermediate array as a buffer to store the output of `LLVMGetStructElementTypes`. Then, it copies them over to an OCaml-heap-allocated array (which internally have the same representation as OCaml tuples) and copies over the elements, wrapping them by setting the low bit to 1.


================
Comment at: llvm/bindings/ocaml/llvm/llvm_ocaml.c:705
+  free(Temp);
+  CAMLreturn(Arr);
 }
----------------
Kakadu wrote:
> This code looks suspicious. If we are going to return an array, why not to try to use `caml_alloc_array` function? 
> 
The original code uses `caml_alloc_tuple`, so that's what I did as well. I think `caml_alloc_array` (which takes a callback that converts the array elements to OCaml values) would work as well. Thanks for suggesting it.


================
Comment at: llvm/bindings/ocaml/llvm/llvm_ocaml.c:1405
+  CAMLreturn(to_val(Value));
+}
+
----------------
Kakadu wrote:
> I'm wondering... Is it a good idea to define a macro for more short definitions of binary operations above?
For this patch, I just did things by-hand. I think this is simpler than a macro for now, as a lot of this code wraps functions by hand. If you really want a macro, I can change it to one.


================
Comment at: llvm/bindings/ocaml/llvm/llvm_ocaml.c:2914
+  return llvm_build_atomicrmw_native(argv[0], argv[1], argv[2], argv[3],
+                                     argv[4], argv[5], argv[6]);
 }
----------------
Kakadu wrote:
> Isn't that strange that argn is not used here?
No. FFI functions of arity greater than five should have two functions, one for native which simply takes all the parameters, and one for bytecode which takes an arg vector and the vector's length (https://v2.ocaml.org/manual/intfc.html#ss:c-prim-impl). It's the way that interface is defined that the bytecode wrapper function receives the number of arguments, but here, we know that the function has an arity of 7 and just forward them from the bytecode wrapper function to the native wrapper function


================
Comment at: llvm/bindings/ocaml/llvm/llvm_ocaml.c:3126
   for (Tl = Incoming; Tl != Val_int(0); Tl = Field(Tl, 1)) {
     value Hd = Field(Tl, 0);
+    LLVMValueRef V = Value_val(Field(Hd, 0));
----------------
Kakadu wrote:
> It looks like the 'value' is not required.
By putting `value`, I define a new `Hd` local variable that shadows the outer one. This code is fine, but reusing the outer variable will also probably be fine. It just depends on your style preference.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136400/new/

https://reviews.llvm.org/D136400



More information about the llvm-commits mailing list