[llvm] [LLVM][Triple] Drop unknown object types from normalized triples (PR #135571)
Usman Nadeem via llvm-commits
llvm-commits at lists.llvm.org
Sun Apr 13 15:12:54 PDT 2025
https://github.com/UsmanNadeem created https://github.com/llvm/llvm-project/pull/135571
According to the LangRef the longest canonical form for the triple is:
`ARCHITECTURE-VENDOR-OPERATING_SYSTEM-ENVIRONMENT`
Seems like object format may also appear at the end of the triple separated by an additional `-` but it looks like object format is part of the `enviornment` as opposed to a seperate identifier. This appears to be the case because various pieces of code that parse the enviornment substring also handle the object format, and often the code only assumes four componenets where the enviornment string may also hold the version number and the object format. Also see: `getEnvironmentName()`.
While creating a Triple, in case of an invalid or unknown object format we call the `getDefaultFormat()` function which sets the appropriate format. So, the object format is never really unknown. Since we always set a default format, having `unknown` as a placeholder can cause issues. This is supported by the fact that the string expectation for an `UnknownObjectFormat` is `""`, as seen in `getObjectFormatTypeName()` instead of `"unknown"`. So, to me it makes sense to drop "unknown" for the triple for object format.
expectation of `getEnvironmentVersionString()` is that if the enviornment string contains a `"-"` then it has the object format at the end and object format name and type should match, which is not the case if "-unknown" is present in the triple.
As a part of this patch I also removed `Triple::CanonicalForm::FIVE_IDENT`.
Change-Id: I5c6ef8fef4ff029ab28f4c3afdab573251cf629c
>From 564e9c7335851f1434e4683339cd123bd68a469d Mon Sep 17 00:00:00 2001
From: "Nadeem, Usman" <mnadeem at quicinc.com>
Date: Sun, 13 Apr 2025 15:06:44 -0700
Subject: [PATCH] [LLVM][Triple] Drop unknown object types from normalized
triples
According to the LangRef the longest canonical form for the triple is:
`ARCHITECTURE-VENDOR-OPERATING_SYSTEM-ENVIRONMENT`
Seems like object format may also appear at the end of the triple separated by
an additional `-` but it looks like object format is part of the `enviornment`
as opposed to a seperate identifier. This appears to be the case because
various pieces of code that parse the enviornment substring also handle the
object format, and often the code only assumes four componenets where the
enviornment string may also hold the version number and the object format.
Also see: `getEnvironmentName()`.
While creating a Triple, in case of an invalid or unknown object format we
call the `getDefaultFormat()` function which sets the appropriate format. So,
the object format is never really unknown. Since we always set a default
format, having `unknown` as a placeholder can cause issues. This is supported
by the fact that the string expectation for an `UnknownObjectFormat` is `""`,
as seen in `getObjectFormatTypeName()` instead of `"unknown"`. So, to me it
makes sense to drop "unknown" for the triple for object format.
expectation of `getEnvironmentVersionString()` is that if the enviornment
string contains a `"-"` then it has the object format at the end and object
format name and type should match, which is not the case if "-unknown" is
present in the triple.
As a part of this patch I also removed `Triple::CanonicalForm::FIVE_IDENT`.
Change-Id: I5c6ef8fef4ff029ab28f4c3afdab573251cf629c
---
llvm/include/llvm/TargetParser/Triple.h | 3 +-
llvm/lib/TargetParser/Triple.cpp | 24 ++++---
llvm/unittests/TargetParser/TripleTest.cpp | 81 ++++++----------------
3 files changed, 40 insertions(+), 68 deletions(-)
diff --git a/llvm/include/llvm/TargetParser/Triple.h b/llvm/include/llvm/TargetParser/Triple.h
index fb6bbc0163701..9bbd14d753958 100644
--- a/llvm/include/llvm/TargetParser/Triple.h
+++ b/llvm/include/llvm/TargetParser/Triple.h
@@ -370,8 +370,7 @@ class Triple {
enum class CanonicalForm {
ANY = 0,
THREE_IDENT = 3, // ARCHITECTURE-VENDOR-OPERATING_SYSTEM
- FOUR_IDENT = 4, // ARCHITECTURE-VENDOR-OPERATING_SYSTEM-ENVIRONMENT
- FIVE_IDENT = 5, // ARCHITECTURE-VENDOR-OPERATING_SYSTEM-ENVIRONMENT-FORMAT
+ FOUR_IDENT = 4, // ARCHITECTURE-VENDOR-OPERATING_SYSTEM-ENVIRONMENT(-FORMAT)
};
/// Turn an arbitrary machine specification into the canonical triple form (or
diff --git a/llvm/lib/TargetParser/Triple.cpp b/llvm/lib/TargetParser/Triple.cpp
index e9e6f130f757c..82ca5781fb137 100644
--- a/llvm/lib/TargetParser/Triple.cpp
+++ b/llvm/lib/TargetParser/Triple.cpp
@@ -1162,11 +1162,12 @@ std::string Triple::normalize(StringRef Str, CanonicalForm Form) {
// Note which components are already in their final position. These will not
// be moved.
- bool Found[4];
+ bool Found[5];
Found[0] = Arch != UnknownArch;
Found[1] = Vendor != UnknownVendor;
Found[2] = OS != UnknownOS;
Found[3] = Environment != UnknownEnvironment;
+ Found[4] = ObjectFormat != UnknownObjectFormat;
// If they are not there already, permute the components into their canonical
// positions by seeing if they parse as a valid architecture, and if so moving
@@ -1202,10 +1203,10 @@ std::string Triple::normalize(StringRef Str, CanonicalForm Form) {
case 3:
Environment = parseEnvironment(Comp);
Valid = Environment != UnknownEnvironment;
- if (!Valid) {
- ObjectFormat = parseFormat(Comp);
- Valid = ObjectFormat != UnknownObjectFormat;
- }
+ break;
+ case 4:
+ ObjectFormat = parseFormat(Comp);
+ Valid = ObjectFormat != UnknownObjectFormat;
break;
}
if (!Valid)
@@ -1335,16 +1336,23 @@ std::string Triple::normalize(StringRef Str, CanonicalForm Form) {
}
}
+ // Leave out unknown object format from the string representation.
+ if (ObjectFormat == UnknownObjectFormat && Components.size() == 5)
+ Components.pop_back();
+
// Canonicalize the components if necessary.
switch (Form) {
case CanonicalForm::ANY:
break;
- case CanonicalForm::THREE_IDENT:
- case CanonicalForm::FOUR_IDENT:
- case CanonicalForm::FIVE_IDENT: {
+ case CanonicalForm::THREE_IDENT: {
Components.resize(static_cast<unsigned>(Form), "unknown");
break;
}
+ case CanonicalForm::FOUR_IDENT: {
+ if (Components.size() < 4)
+ Components.resize(static_cast<unsigned>(Form), "unknown");
+ break;
+ }
}
// Stick the corrected components back together to form the normalized string.
diff --git a/llvm/unittests/TargetParser/TripleTest.cpp b/llvm/unittests/TargetParser/TripleTest.cpp
index 61b3637bb48e2..ebfed202105af 100644
--- a/llvm/unittests/TargetParser/TripleTest.cpp
+++ b/llvm/unittests/TargetParser/TripleTest.cpp
@@ -1384,8 +1384,7 @@ TEST(TripleTest, Normalization) {
EXPECT_EQ("unknown-unknown", Triple::normalize("-"));
EXPECT_EQ("unknown-unknown-unknown", Triple::normalize("--"));
EXPECT_EQ("unknown-unknown-unknown-unknown", Triple::normalize("---"));
- EXPECT_EQ("unknown-unknown-unknown-unknown-unknown",
- Triple::normalize("----"));
+ EXPECT_EQ("unknown-unknown-unknown-unknown", Triple::normalize("----"));
EXPECT_EQ("a", Triple::normalize("a"));
EXPECT_EQ("a-b", Triple::normalize("a-b"));
@@ -1403,7 +1402,7 @@ TEST(TripleTest, Normalization) {
EXPECT_EQ("a-pc-b-c", Triple::normalize("a-b-c-pc"));
EXPECT_EQ("a-b-linux", Triple::normalize("a-b-linux"));
- EXPECT_EQ("unknown-unknown-linux-b-c", Triple::normalize("linux-b-c"));
+ EXPECT_EQ("unknown-unknown-linux-b-elf", Triple::normalize("linux-b-elf"));
EXPECT_EQ("a-unknown-linux-c", Triple::normalize("a-linux-c"));
EXPECT_EQ("i386-pc-a", Triple::normalize("a-pc-i386"));
@@ -1438,27 +1437,15 @@ TEST(TripleTest, Normalization) {
Triple::normalize("a-b-c-d", Triple::CanonicalForm::FOUR_IDENT));
EXPECT_EQ("a-b-c-d",
Triple::normalize("a-b-c-d-e", Triple::CanonicalForm::FOUR_IDENT));
-
- EXPECT_EQ("a-unknown-unknown-unknown-unknown",
- Triple::normalize("a", Triple::CanonicalForm::FIVE_IDENT));
- EXPECT_EQ("a-b-unknown-unknown-unknown",
- Triple::normalize("a-b", Triple::CanonicalForm::FIVE_IDENT));
- EXPECT_EQ("a-b-c-unknown-unknown",
- Triple::normalize("a-b-c", Triple::CanonicalForm::FIVE_IDENT));
- EXPECT_EQ("a-b-c-d-unknown",
- Triple::normalize("a-b-c-d", Triple::CanonicalForm::FIVE_IDENT));
- EXPECT_EQ("a-b-c-d-e",
- Triple::normalize("a-b-c-d-e", Triple::CanonicalForm::FIVE_IDENT));
+ EXPECT_EQ(
+ "a-b-c-d-elf",
+ Triple::normalize("a-b-c-d-elf", Triple::CanonicalForm::FOUR_IDENT));
EXPECT_EQ("i386-b-c-unknown",
Triple::normalize("i386-b-c", Triple::CanonicalForm::FOUR_IDENT));
- EXPECT_EQ("i386-b-c-unknown-unknown",
- Triple::normalize("i386-b-c", Triple::CanonicalForm::FIVE_IDENT));
EXPECT_EQ("i386-a-c-unknown",
Triple::normalize("a-i386-c", Triple::CanonicalForm::FOUR_IDENT));
- EXPECT_EQ("i386-a-c-unknown-unknown",
- Triple::normalize("a-i386-c", Triple::CanonicalForm::FIVE_IDENT));
EXPECT_EQ("i386-a-b-unknown",
Triple::normalize("a-b-i386", Triple::CanonicalForm::FOUR_IDENT));
@@ -1502,46 +1489,24 @@ TEST(TripleTest, Normalization) {
"x86_64-unknown-linux-gnu",
Triple::normalize("x86_64-gnu-linux", Triple::CanonicalForm::FOUR_IDENT));
- EXPECT_EQ("i386-a-b-unknown-unknown",
- Triple::normalize("a-b-i386", Triple::CanonicalForm::FIVE_IDENT));
- EXPECT_EQ("i386-a-b-c-unknown",
- Triple::normalize("a-b-c-i386", Triple::CanonicalForm::FIVE_IDENT));
-
- EXPECT_EQ("a-pc-c-unknown-unknown",
- Triple::normalize("a-pc-c", Triple::CanonicalForm::FIVE_IDENT));
- EXPECT_EQ("unknown-pc-b-c-unknown",
- Triple::normalize("pc-b-c", Triple::CanonicalForm::FIVE_IDENT));
- EXPECT_EQ("a-pc-b-unknown-unknown",
- Triple::normalize("a-b-pc", Triple::CanonicalForm::FIVE_IDENT));
- EXPECT_EQ("a-pc-b-c-unknown",
- Triple::normalize("a-b-c-pc", Triple::CanonicalForm::FIVE_IDENT));
-
- EXPECT_EQ("a-b-linux-unknown-unknown",
- Triple::normalize("a-b-linux", Triple::CanonicalForm::FIVE_IDENT));
- EXPECT_EQ("unknown-unknown-linux-b-c",
- Triple::normalize("linux-b-c", Triple::CanonicalForm::FIVE_IDENT));
- EXPECT_EQ("a-unknown-linux-c-unknown",
- Triple::normalize("a-linux-c", Triple::CanonicalForm::FIVE_IDENT));
-
- EXPECT_EQ("i386-pc-a-unknown-unknown",
- Triple::normalize("a-pc-i386", Triple::CanonicalForm::FIVE_IDENT));
- EXPECT_EQ("i386-pc-unknown-unknown-unknown",
- Triple::normalize("-pc-i386", Triple::CanonicalForm::FIVE_IDENT));
- EXPECT_EQ("unknown-pc-linux-c-unknown",
- Triple::normalize("linux-pc-c", Triple::CanonicalForm::FIVE_IDENT));
- EXPECT_EQ("unknown-pc-linux-unknown-unknown",
- Triple::normalize("linux-pc-", Triple::CanonicalForm::FIVE_IDENT));
-
- EXPECT_EQ("i386-unknown-unknown-unknown-unknown",
- Triple::normalize("i386", Triple::CanonicalForm::FIVE_IDENT));
- EXPECT_EQ("unknown-pc-unknown-unknown-unknown",
- Triple::normalize("pc", Triple::CanonicalForm::FIVE_IDENT));
- EXPECT_EQ("unknown-unknown-linux-unknown-unknown",
- Triple::normalize("linux", Triple::CanonicalForm::FIVE_IDENT));
-
- EXPECT_EQ(
- "x86_64-unknown-linux-gnu-unknown",
- Triple::normalize("x86_64-gnu-linux", Triple::CanonicalForm::FIVE_IDENT));
+ EXPECT_EQ("arm-unknown-linux-gnueabi",
+ Triple::normalize("arm-linux-gnueabi",
+ Triple::CanonicalForm::FOUR_IDENT));
+ EXPECT_EQ("arm-unknown-linux-gnueabi-elf",
+ Triple::normalize("arm-linux-gnueabi-elf",
+ Triple::CanonicalForm::FOUR_IDENT));
+ EXPECT_EQ("arm-unknown-linux-gnueabi",
+ Triple::normalize("arm-linux-gnueabi-xyz",
+ Triple::CanonicalForm::FOUR_IDENT));
+ EXPECT_EQ("arm-unknown-linux-gnueabi",
+ Triple::normalize("arm-linux-gnueabi-unknown",
+ Triple::CanonicalForm::FOUR_IDENT));
+ EXPECT_EQ("arm-unknown-linux-gnueabi",
+ Triple::normalize("arm-unknown-linux-gnueabi-unknown",
+ Triple::CanonicalForm::FOUR_IDENT));
+ EXPECT_EQ("arm-unknown-linux-gnueabi",
+ Triple::normalize("arm-unknown-linux-gnueabi-xyz",
+ Triple::CanonicalForm::ANY));
// Check that normalizing a permutated set of valid components returns a
// triple with the unpermuted components.
More information about the llvm-commits
mailing list