https://github.com/mysterymath approved this pull request. LGTM for style, organization, and general malloc implementation logic. But I can't speak to the GPU-specific threading or locking concerns. https://github.com/llvm/llvm-project/pull/140156