[Lldb-commits] [lldb] [lldb][test] Toolchain detection rewrite in Python (PR #102185)

Tue Sep 10 04:26:29 PDT 2024

================
@@ -96,16 +98,120 @@ def getArchSpec(self, architecture):
         """
         return ["ARCH=" + architecture] if architecture else []
 
-    def getCCSpec(self, compiler):
+    def getToolchainSpec(self, compiler):
         """
-        Helper function to return the key-value string to specify the compiler
+        Helper function to return the key-value strings to specify the toolchain
         used for the make system.
         """
         cc = compiler if compiler else None
         if not cc and configuration.compiler:
             cc = configuration.compiler
+
         if cc:
-            return ['CC="%s"' % cc]
+            exe_ext = ""
+            if lldbplatformutil.getHostPlatform() == "windows":
+                exe_ext = ".exe"
+
+            cc = cc.strip()
+            cc_path = pathlib.Path(cc)
+            cc = cc_path.as_posix()
+
+            # We can get CC compiler string in the following formats:
+            #  [<tool>] <compiler>    - such as 'xrun clang', 'xrun /usr/bin/clang' & etc
+            #
+            # Where <compiler> could contain the following parts:
+            #   <simple-name>[.<exe-ext>]                           - sucn as 'clang', 'clang.exe' ('clamg-cl.exe'?)
+            #   <target-triple>-<simple-name>[.<exe-ext>]           - such as 'armv7-linux-gnueabi-gcc'
+            #   <path>/<simple-name>[.<exe-ext>]                    - such as '/usr/bin/clang', 'c:\path\to\compiler\clang,exe'
+            #   <path>/<target-triple>-<simple-name>[.<exe-ext>]    - such as '/usr/bin/clang', 'c:\path\to\compiler\clang,exe'
+
+            cc_ext = cc_path.suffix
+            # Compiler name without extension
+            cc_name = cc_path.stem.split(" ")[-1]
+
+            # A kind of compiler (canonical name): clang, gcc, cc & etc.
+            cc_type = cc_name
+            # A triple prefix of compiler name: <armv7-none-linux-gnu->gcc
+            cc_prefix = ""
+            if not "clang-cl" in cc_name and not "llvm-gcc" in cc_name:
+                cc_name_parts = cc_name.split("-")
+                cc_type = cc_name_parts[-1]
+                if len(cc_name_parts) > 1:
+                    cc_prefix = "-".join(cc_name_parts[:-1]) + "-"
+
+            # A kind of C++ compiler.
+            cxx_types = {
+                "icc": "icpc",
+                "llvm-gcc": "llvm-g++",
+                "gcc": "g++",
+                "cc": "c++",
+                "clang": "clang++",
+            }
+            cxx_type = cxx_types.get(cc_type, cc_type)
+
+            cc_dir = cc_path.parent
+
+            def getLlvmUtil(util_name):
+                llvm_tools_dir = os.getenv("LLVM_TOOLS_DIR", cc_dir.as_posix())
+                return llvm_tools_dir + "/" + util_name + exe_ext
+
+            def getToolchainUtil(util_name):
+                return (cc_dir / (cc_prefix + util_name + cc_ext)).as_posix()
+
+            cxx = getToolchainUtil(cxx_type)
+
+            util_names = {
+                "OBJCOPY": "objcopy",
+                "STRIP": "objcopy",
+                "ARCHIVER": "ar",
+                "DWP": "dwp",
+            }
+            utils = []
+
+            # Note: LLVM_AR is currently required by API TestBSDArchives.py tests.
+            # Assembly a full path to llvm-ar for given LLVM_TOOLS_DIR;
+            # otherwise assume we have llvm-ar at the same place where is CC specified.
+            if not os.getenv("LLVM_AR"):
+                llvm_ar = getLlvmUtil("llvm-ar")
+                utils.extend(['LLVM_AR="%s"' % llvm_ar])
+
+            if not lldbplatformutil.platformIsDarwin():
+                if os.getenv("USE_LLVM_TOOLS"):
----------------
labath wrote:

Thanks. Before you go on the next PR, let me just elaborate on what I'm thinking.

Basically, the (API) test suite can run in two modes. The first one (the default) is where it uses all of the tools from the build tree that it can. This includes the compiler and the other tools like llvm-objcopy, etc. This is done for two reasons:
- it minimizes the amount of external dependencies
- it maximizes the reproducibility of the build

This mode doesn't (shouldn't?) require any special flags or configuration. The second mode is where you can run the tests against a toolchain (mostly: compiler) of your choosing, and there it's expected that you may need to pass many arguments in order to make it work.

The thing which bothers me about this flag is that it's a flag that forces a switch back to the default mode (for some tools), which is makes reasoning about it more complicated. I think it would be better to just make the behavior you want to be the default, and then (if necessary) have a separate flag to force the usage of a specific tool (e.g. objcopy). I think that would be similar to how things work with e.g. dsymutil, where by default we pick the in-tree version, but we have a cmake/dotest/etc. flag to let the user choose a different one.

This would mean that users that run tests against gcc (and which want to continue using the gcc's version of objcopy) could no longer rely on the compiler-based detection and would have to pass (and add it to dotest first) an additional flag explicitly, but I think that's fine, because:
- These tests mostly don't care about the objcopy version. Their main purpose is to capture the differences in DWARF emitted by different compilers. We have better ways of testing the differences in object file (elf) output.
- I'm not sure if we have any such users at the moment.

https://github.com/llvm/llvm-project/pull/102185