[PATCH] D136400: [llvm-ocaml] Migrate from naked pointers to prepare for OCaml 5

Josh Berdine via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Feb 23 15:09:56 PST 2023


jberdine added a comment.

I agree that the `caml_` prefix on `caml_alloc_tuple_uninit` is probably not a good idea.

Note that the intent of my comment about `llvm_param_types` was to show one concrete example of a pattern that occurs in several other functions, whose ml types return arrays, such as `llvm_struct_element_types`, `llvm_subtypes`, `llvm_indices`, etc. The added allocation seems unavoidable for the functions that accept an array and need to create an analogue where the lsbs have been cleared before passing it to an LLVM function. In a sequential setting, it would be possible to clear the lsbs of elements of the argument OCaml array in place, pass it to LLVM, and then re-set the lsbs. I am not suggesting to do that, it would be unsafely racy in a concurrent setting.

I don't know the history of what the FFI docs said when, but the OCaml bindings are *old*. The difference between `caml_alloc_small` and `caml_alloc_shr` is that the former allocates on the minor heap and the latter on the major heap. The FFI rules distinguish them since if a pointer to the/a minor heap is stored into the major heap, then the GC needs to be informed or else it may incorrectly consider the object in the minor heap to be dead. This is what `caml_initialize` and `Store_field` do. The key here is that in the case of the LLVM bindings, a crucial point of the design that sets the low bit of pointers to LLVM objects is that the memory allocated by either `caml_alloc_small` or `caml_alloc_shr` is initialized with values that the OCaml GC considers to be non-pointers. In particular, they cannot possibly be old-to-young pointers. This is the invariant across the whole of the LLVM bindings that makes the current code safe.

Replacing the existing calls to `caml_alloc_tuple_uninit` with `caml_alloc_tuple` will result in much of the memory allocated by the bindings being needlessly double-initialized. That is a change that is orthogonal to removing naked pointers, and I think that it should be considered separately. (It is not a change that I would myself like to see made, but it is a judgement call with pros and cons.)



================
Comment at: llvm/bindings/ocaml/llvm/llvm_ocaml.c:247
 value llvm_enum_attr_kind(value Name) {
+  CAMLparam1(Name);
   unsigned Kind = LLVMGetEnumAttributeKindForName(String_val(Name),
----------------
Is this CAMLparam/CAMLreturn left over from combining the two diffs, or necessary for a reason I am overlooking?


================
Comment at: llvm/bindings/ocaml/llvm/llvm_ocaml.c:346
+value llvm_print_module(value Filename, value M) {
+  CAMLparam1(Filename);
   char *Message;
----------------
I don't see why adding this CAMLparam is necessary


================
Comment at: llvm/bindings/ocaml/llvm/llvm_ocaml.c:433-434
+value llvm_string_of_lltype(value M) {
+  CAMLparam0();
+  CAMLlocal1(TypeStr);
+  char *TypeCStr = LLVMPrintTypeToString(Type_val(M));
----------------
Left over from combining the 2 diffs?


================
Comment at: llvm/bindings/ocaml/llvm/llvm_ocaml.c:753-754
+value llvm_classify_value(value V) {
+  CAMLparam0();
+  CAMLlocal1(Result);
+  LLVMValueRef Val = Value_val(V);
----------------
Left over?


================
Comment at: llvm/bindings/ocaml/llvm/llvm_ocaml.c:811-812
+value llvm_string_of_llvalue(value M) {
+  CAMLparam0();
+  CAMLlocal1(ValueStr);
+  char *ValueCStr = LLVMPrintValueToString(Value_val(M));
----------------
Left over?


================
Comment at: llvm/bindings/ocaml/llvm/llvm_ocaml.c:2119-2123
+    V = to_val(LLVMGetIncomingValue(PhiNode, --I));
+    BB = to_val(LLVMGetIncomingBlock(PhiNode, I));
     Hd = caml_alloc_small(2, 0);
-    Field(Hd, 0) = (value)LLVMGetIncomingValue(PhiNode, --I);
-    Field(Hd, 1) = (value)LLVMGetIncomingBlock(PhiNode, I);
+    Field(Hd, 0) = V;
+    Field(Hd, 1) = BB;
----------------
I am not sure why this change is needed, rather than changing the `(value)` casts in the original to `to_val`. In particular, if `V` and `BB` are computed after `Hd` is allocated, they do not need to survive any allocations, so do not need to be roots.


================
Comment at: llvm/bindings/ocaml/llvm/llvm_ocaml.c:2125
 
-    value Tmp = caml_alloc_small(2, 0);
-    Field(Tmp, 0) = Hd;
-    Field(Tmp, 1) = Tl;
-    Tl = Tmp;
+    List = caml_alloc_small(2, 0);
+    Field(List, 0) = Hd;
----------------
`List` does not need to be a root since it is not live across any allocations.


================
Comment at: llvm/bindings/ocaml/llvm/llvm_ocaml.c:2167
 value llvm_position_builder(value Pos, value B) {
+  CAMLparam2(Pos, B);
   if (Tag_val(Pos) == 0) {
----------------
Left over?


================
Comment at: llvm/bindings/ocaml/llvm/llvm_ocaml.c:2180
+value llvm_insertion_block(value B) {
+  CAMLparam1(B);
   LLVMBasicBlockRef InsertBlock = LLVMGetInsertBlock(Builder_val(B));
----------------
Left over?


================
Comment at: llvm/bindings/ocaml/llvm/llvm_ocaml.c:2746-2747
+value llvm_build_phi(value Incoming, value Name, value B) {
+  CAMLparam3(Incoming, Name, B);
+  CAMLlocal2(Hd, Tl);
 
----------------
Left over?


================
Comment at: llvm/bindings/ocaml/llvm/llvm_ocaml.c:2861
+value llvm_memorybuffer_of_file(value Path) {
+  CAMLparam1(Path);
   char *Message;
----------------
Left over?


================
Comment at: llvm/bindings/ocaml/llvm/llvm_ocaml.c:2884
+value llvm_memorybuffer_of_string(value Name, value String) {
+  CAMLparam2(Name, String);
   LLVMMemoryBufferRef MemBuf;
----------------
Left over?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136400/new/

https://reviews.llvm.org/D136400



More information about the llvm-commits mailing list