[libc-commits] [libc] [libc][wctype][codegen] Add generation script for conversion data (PR #170868)

Michael Jones via libc-commits libc-commits at lists.llvm.org
Fri Dec 5 16:22:54 PST 2025


================
@@ -0,0 +1,28 @@
+#!/usr/bin/env python3
+#
+# ===- Fetch files necessary for wctype generator ------------*- python -*--==#
+#
+# Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+# See https://llvm.org/LICENSE.txt for license information.
+# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+#
+# ==------------------------------------------------------------------------==#
+#
+# This file is meant to be run manually by maintainers to fetch the latest
+# unicode data files from unicode.org necessary for generating wctype data.
+# All rights to the data belong to unicode.org.
+
+from urllib.request import urlretrieve
+
+
+def fetch_unicode_data_files(
+    llvm_project_root_path: str,
+    files=["UnicodeData.txt"],
+    base_url="https://www.unicode.org/Public/UCD/latest/ucd/",
----------------
michaelrj-google wrote:

My personal preference is for the script to use local files and provide the user a URL to download the files from manually. That way we don't accidentally download lots of large files.

https://github.com/llvm/llvm-project/pull/170868


More information about the libc-commits mailing list